Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning

Large language models (LLMs) generate not only reasoning text, but also token-level confidence trajectories that record how uncertainty evolves during inference. Whether these trajectories are relevant to reasoning correctness remains unclear. Here we show that confidence trajectories encode a content-agnostic confidence geometry associated with trace-level final-answer correctness. Using only token-level confidence values, without access to the input question, reasoning text, hidden states, or external verifiers, we find that low-dimensional representations of confidence trajectories separate correct from incorrect reasoning traces. Across GSM8K, MATH, and MMLU, this geometric separation is quantitatively linked to downstream predictability: stronger clustering of correct and incorrect traces, measured by the Davies--Bouldin index, consistently corresponds to higher correctness-discrimination AUC. We further show that correctness-related information is enriched in the tail of reasoning, suggesting that late-stage confidence dynamics carry key correctness signals. We propose NeuralConf, a lightweight estimator that learns from confidence trajectories for correctness evaluation. Under a fixed trace budget, NeuralConf-derived scores improve confidence-weighted answer aggregation over majority voting, tail confidence, and other static baselines. These results reveal that LLMs expose trace-intrinsic statistical signals of correctness through their own confidence dynamics, offering a route to improve inference using information already present within generation.

preprint2026arXiv

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve performance and efficiency over single-agent baselines. We find that homogeneous collaboration with limited modalities is sufficient for short-range navigation under salient cues; heterogeneous collaboration among agents with complementary modalities is generally efficient and effective; and navigation in large, complex environments requires both richer multi-modal perception and increased model capacity.

preprint2026arXiv

Hybrid Disclination Skin-topological Effects in Non-Hermitian Circuits

The bulk-disclination correspondence (BDC) is a fundamental concept in Hermitian systems that has been widely applied to predict disclination states. Recently, disclination states have also been observed and experimentally verified in non-Hermitian systems with C6 lattice symmetry, where gain and loss are introduced to induce non-Hermiticity. In this Letter, we propose a non-Hermitian two-dimensional (2D) Su-Schrieffer-Heeger (SSH) disclination model with skin-topological (ST) disclination states, and calculate its biorthogonal Zak phase. Together with the real-space disclination index, we predict the emergence of disclination states in a C4-symmetric non-Hermitian lattice and the corresponding fractional charge. We also generalize the symmetry indicator within the biorthogonal framework to predict the anomalous filling near the disclination core. Experimentally, the model is implemented on a nonreciprocal circuit platform, where we analyze the impedance matrix characterized by complex eigenfrequencies and directly observe the ST disclination states. Our work further extends the bulk-disclination correspondence to the non-Hermitian realm.

preprint2026arXiv

Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding

Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordination bridge between perception and language reasoning. InfoCoordiBridge comprises (i) a unified multi-agent perception layer that outputs typed structured facts together with modality-focused synopses, (ii) an ICA module that aligns and fuses multi-source outputs into a single SceneSummary, and (iii) an SSRE module that performs SceneSummary-grounded reasoning with verification. Experiments on nuScenes and Waymo show that ICA preserves competitive 3D detection accuracy while substantially improving fusion consistency, reducing redundancy to below 1% and achieving about 98% attribute agreement. On NuScenes-QA and a template-aligned Waymo-QA benchmark, SSRE improves factual grounding and reduces hallucinated entity mentions compared with representative VLM and agentic baselines. Overall, by coordinating multi-sensor outputs into a single conflict-aware SceneSummary before prompting, InfoCoordiBridge prevents redundant and cross-modally inconsistent perception evidence from propagating into high-level reasoning.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2025arXiv

Symmetry restoration and quantum Mpemba effect in many-body localization systems

Non-equilibrium dynamics of quantum many-body systems has attracted increasing attention owing to a variety of intriguing phenomena absent in equilibrium physics. A prominent example is the quantum Mpemba effect, where subsystem symmetry is restored more rapidly under a symmetric quench from a more asymmetric initial state. In this work, we investigate symmetry restoration and the quantum Mpemba effect in many-body localized systems for a range of initial states. We show that symmetry can still be restored in the many-body localization regime without approaching thermal equilibrium. Moreover, we demonstrate that the quantum Mpemba effect emerges universally for any tilted product state, in contrast to chaotic systems where its occurrence depends sensitively on the choice of the initial state. We further provide a theoretical analysis of symmetry restoration and the quantum Mpemba effect using an effective model for many-body localization. Overall, this paper fills an important gap in establishing a unified understanding of symmetry restoration and the quantum Mpemba effect in generic many-body systems, and it advances our understanding of many-body localization.

preprint2022arXiv

A Temporal-oriented Broadcast ResNet for COVID-19 Detection

Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual learning method that achieves efficient computation and high accuracy with a small model size. Based on the EfficientNet architecture, our novel network, named Temporal-oriented ResNet~(TorNet), constitutes of a broadcasting learning block, i.e. the Alternating Broadcast (AB) Block, which contains several Broadcast Residual Blocks (BC ResBlocks) and a convolution layer. With the AB Block, the network obtains useful audio-temporal features and higher level embeddings effectively with much less computation than Recurrent Neural Networks~(RNNs), typically used to model temporal information. TorNet achieves 72.2% Unweighted Average Recall (UAR) on the INTERPSEECH 2021 Computational Paralinguistics Challenge COVID-19 cough Sub-Challenge, by this showing competitive results with a higher computational efficiency than other state-of-the-art alternatives.

preprint2022arXiv

Audio Self-supervised Learning: A Survey

Inspired by the humans' cognitive ability to generalise knowledge and skills, Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations, which is an expensive and time consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarising the knowledge in audio SSL are currently missing. To fill this gap, in the present work, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarise the empirical works that exploit the audio modality in multi-modal SSL frameworks, and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions on the development of audio SSL.

preprint2022arXiv

Gain and loss induced topological insulating phase in a non Hermitian electrical circuit

There have been considerable efforts devoted to the study of topological phases in certain non-Hermitian systems that possess real eigenfrequencies in the presence of gain and loss. However, it is challenging to experimentally realize such non-Hermitian topological insulators in either quantum or photonic systems, due to the difficulties in introducing controlled gain and loss. On the other hand, the wide choices of active circuit components provide us with unprecedented convenience and flexibility in engineering non-Hermitian topological insulators in electrical circuits. Here, we report experimental realization of a one-dimensional (1D) non-Hermitian topological circuit which exhibits topologically protected edge state purely induced by gain and loss. We show that by tuning the value of the positive/negative resistors in the circuit, our system can switch between different topological phase regions. The topological edge states and interface states are observed at the circuit edge and at the interface between a trivial and nontrivial circuit, which are manifested by a prominent impedance peak at the mid-gap frequency topologically robust to variations of circuit parameters. Our work opens a new gateway towards actively controllable topological systems.

preprint2022arXiv

M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval

Videos contain multi-modal content, and exploring multi-level cross-modal interactions with natural language queries can provide great prominence to text-video retrieval task (TVR). However, new trending methods applying large-scale pre-trained model CLIP for TVR do not focus on multi-modal cues in videos. Furthermore, the traditional methods simply concatenating multi-modal features do not exploit fine-grained cross-modal information in videos. In this paper, we propose a multi-level multi-modal hybrid fusion (M2HF) network to explore comprehensive interactions between text queries and each modality content in videos. Specifically, M2HF first utilizes visual features extracted by CLIP to early fuse with audio and motion features extracted from videos, obtaining audio-visual fusion features and motion-visual fusion features respectively. Multi-modal alignment problem is also considered in this process. Then, visual features, audio-visual fusion features, motion-visual fusion features, and texts extracted from videos establish cross-modal relationships with caption queries in a multi-level way. Finally, the retrieval outputs from all levels are late fused to obtain final text-video retrieval results. Our framework provides two kinds of training strategies, including an ensemble manner and an end-to-end manner. Moreover, a novel multi-modal balance loss function is proposed to balance the contributions of each modality for efficient end-to-end training. M2HF allows us to obtain state-of-the-art results on various benchmarks, eg, Rank@1 of 64.9\%, 68.2\%, 33.2\%, 57.1\%, 57.8\% on MSR-VTT, MSVD, LSMDC, DiDeMo, and ActivityNet, respectively.

preprint2022arXiv

Multi-Forgery Detection Challenge 2022: Push the Frontier of Unconstrained and Diverse Forgery Detection

In this paper, we present the Multi-Forgery Detection Challenge held concurrently with the IEEE Computer Society Workshop on Biometrics at CVPR 2022. Our Multi-Forgery Detection Challenge aims to detect automatic image manipulations including but not limited to image editing, image synthesis, image generation, image photoshop, etc. Our challenge has attracted 674 teams from all over the world, with about 2000 valid result submission counts. We invited the Top 10 teams to present their solutions to the challenge, from which three teams are awarded prizes in the grand finale. In this paper, we present the solutions from the Top 3 teams, in order to boost the research work in the field of image forgery detection.

preprint2022arXiv

Surface critical properties of the three-dimensional clock model

Using Monte Carlo simulations and finite-size scaling analysis, we show that the $q$-state clock model with $q=6$ on the simple cubic lattice with open surfaces has a rich phase diagram; in particular, it has an extraordinary-log phase, besides the ordinary and extraordinary transitions at the bulk critical point. We prove numerically that the presence of the intermediate extraordinary-log phase is due to the emergence of an O(2) symmetry in the surface state before the surface enters the $Z_{q}$ symmetry-breaking region as the surface coupling is increased at the bulk critical point, while O(2) symmetry emerges for the bulk. The critical behaviors of the extraordinary-log transition, as well as the ordinary and the special transition separating the ordinary and the extraordinary-log transition are obtained.

preprint2020arXiv

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

The COVID-19 outbreak was announced as a global pandemic by the World Health Organisation in March 2020 and has affected a growing number of people in the past few weeks. In this context, advanced artificial intelligence techniques are brought to the fore in responding to fight against and reduce the impact of this global health crisis. In this study, we focus on developing some potential use-cases of intelligent speech analysis for COVID-19 diagnosed patients. In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety. For this purpose, two established acoustic feature sets and support vector machines are utilised. Our experiments show that an average accuracy of .69 obtained estimating the severity of illness, which is derived from the number of days in hospitalisation. We hope that this study can foster an extremely fast, low-cost, and convenient way to automatically detect the COVID-19 disease.

preprint2020arXiv

Consensus, Bi-polarization and Multiformity in Opinion Dynamics with Bidirectional Thresholds

Many empirical networks are intrinsically pluralistic, with interactions occurring within groups of arbitrary agents. Then the agent in the network can be influenced by types of neighbors, common examples include similarity, opposition, and negligibility. Although the influence of neighbors can be described as an amicable and antagonistic relationship in complex real-world systems accurately, and the research on the dynamic process of public opinion evolution with different types of influence is valuable, few studies have mentioned that issue. In this paper, we develop a novel model on networks of agents with the bi-directional bounded thresholds for studying the evolution of opinion dynamics. We define the scope of individual assimilation and exclusion to identify different types of neighbors and calculate the impact of the corresponding neighbors on the individuals by converting the opinion difference. The simulation results show that the proposed mechanism can effectively explain the formation of bi-polarization during opinion evolution and the settings of the bi-directional bounded thresholds significantly influence the eventual distribution of opinions. Furthermore, we explore the impacts of the initial conditions and the structure of the small-world network on the evolution of opinions.

preprint2020arXiv

NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration

We present NetReduce, a novel RDMA-compatible in-network reduction architecture to accelerate distributed DNN training. Compared to existing designs, NetReduce maintains a reliable connection between end-hosts in the Ethernet and does not terminate the connection in the network. The advantage of doing so is that we can fully reuse the designs of congestion control and reliability in RoCE. In the meanwhile, we do not need to implement a high-cost network protocol processing stack in the switch, as IB does. The prototype implemented by using FPGA is an out-of-box solution without modifying commodity devices such as NICs or switches. For the coordination between the end-host and the switch, NetReduce customizes the transport protocol only on the first packet in a data message to comply with RoCE v2. The special status monitoring module is designed to reuse the reliability mechanism of RoCE v2 for dealing with packet loss. A message-level credit-based flow control algorithm is also proposed to fully utilize bandwidth and avoid buffer overflow. We study the effects of intra bandwidth on the training performance in multi-machines multi-GPUs scenario and give sufficient conditions for hierarchical NetReduce to outperform other algorithms. We also extend the design from rack-level aggregation to more general spine-leaf topology in the data center. NetReduce accelerates the training up to 1.7x and 1.5x for CNN-based CV and transformer-based NLP tasks, respectively. Simulations on large-scale systems indicate the superior scalability of NetReduce to the state-of-the-art ring all-reduce.

preprint2020arXiv

Towards implementation of a magic optical-dipole trap for confining ground-state and Rydberg-state cesium cold atoms

Long ground-Rydberg coherence lifetime is interesting for implementing high-fidelity quantum logic gates, many-body physics, and other quantum information protocols. However, the potential formed by a conventional far-off-resonance red-detuned optical-dipole trap (ODT) is usually repulsive for Rydberg atoms, which will result in fast atom loss and low repetition rate of the experimental sequence. These issues can be addressed by a magic ODT. We performed the calculation of ODT's magic detuning for confinement of cesium ground state and Rydberg state with the same potential well. We used a sum-over-states method to calculate the dynamic polarizabilities of $6S_{1/2}$ ground state and highly-excited ($nS_{1/2}$ and $nP_{3/2}$) Rydberg state of cesium atoms, and identify corresponding magic detuning for optical wavelengths in the range of $850 - 2000$ nm. We estimated the trapping lifetime of cesium Rydberg atoms confined in the magic ODT by including different dissipative mechanisms. Furthermore, we have experimentally realized an 1879.43-nm single-frequency laser system with a watt-level output power for setting up the magic ODT for $6S_{1/2}$ ground-state and $84P_{3/2}$ Rydberg-state cesium cold atoms.