Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

Difference-in-Differences using Double Negative Controls and Graph Neural Networks for Unmeasured Network Confounding

Estimating causal effects from observational network data faces dual challenges of network interference and unmeasured confounding. To address this, we propose a general Difference-in-Differences framework that integrates double negative controls (DNC) and graph neural networks (GNNs). Based on the modified parallel trends assumption and DNC, semiparametric identification of direct and indirect causal effects is established. We then propose doubly robust estimators. Specifically, an approach combining GNNs with the generalized method of moments is developed to estimate the functions of high-dimensional covariates and network structure. Furthermore, we derive the estimator's asymptotic normality under the $ψ$-network dependence and approximate neighborhood interference. Simulations show the finite-sample performance of our estimators. Finally, we apply our method to analyze the impact of China's green credit policy on corporate green innovation.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2026arXiv

Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs

Although Large Multimodal Models (LMMs) have achieved strong performance on general video understanding, their susceptibility to textual prior shortcuts during causal discovery has been recognized as a critical deficit. The underlying mechanisms of this phenomenon remain incompletely understood, as existing benchmarks only measure response accuracy without revealing the sources and extent of the deficit. We introduce ProCauEval, a perturbation-based evaluation protocol that shifts from outcome assessment to mechanism diagnosis, probing causal discovery through five controlled configurations that systematically manipulate visual and textual modalities to decompose their respective contributions to model behavior and dissect the failure modes. Evaluating 17 mainstream LMMs, we find that models faithfully perceive video content yet systematically underexploit it during causal reasoning. We further observe that stronger post-training amplifies rather than mitigates textual prior reliance, and that higher baseline performance correlates with greater fragility under perturbation. To address these, we propose Anti-Distillation Policy Optimization (ADPO), a reinforcement learning framework built on negative teacher alignment, which augments GRPO by explicitly pushing the policy away from a prior-only counterfactual teacher induced by visual corruption. Specifically, ADPO maximizes the divergence between the policy distributions conditioned on the original and visually corrupted inputs, thereby forcing the model to ground its reasoning in visual evidence rather than textual shortcuts. Extensive experiments show that ADPO improves visual engagement without sacrificing fundamental comprehension, thus offering a preliminary step toward reliable causal discovery.

preprint2026arXiv

Tighter Regret Bounds for Contextual Action-Set Reinforcement Learning

We study episodic reinforcement learning with fixed reward and transition functions, but with episode-dependent admissible action sets that are observed at the start of each episode. Performance is measured by cumulative regret against the episode-wise optimal value, $\sum_{k=1}^K [V^{*,M^k} - V^{π^k,M^k}]$, where $M^k$ represents the action context in the $k$-th episode. We show that the MVP algorithm naturally extends to this framework and enjoys strong theoretical guarantees. In particular, we establish a minimax regret bound of $\widetilde{O}(\sqrt{SAH^3K\log L})$ for adversarial contexts, where $L$ denotes the number of possible contexts. This result implies a regret bound of $\widetilde{O}(\sqrt{SAH^3K})$ for stochastic contexts. We further translate the stochastic regret guarantee into a sample complexity bound of $\widetilde{O}(SAH^3/ε^2)$ for a fixed context distribution. In addition, we derive a gap-dependent regret bound of \[ \widetilde O\left( \inf_{p\in [0,1)} \left( \frac{1}{Δ_{\min}^{p}} + pKΔ_{\min}^{p} \right)\log K \cdot \mathrm{poly}(S,A,H) \right), \] where $Δ_{\min}^{p}$ is the global $p$-trimmed positive-gap floor over suboptimal $(h,s,a)$ triples. This bound can substantially improve upon the minimax rate when the relevant suboptimality gaps are large.

preprint2026arXiv

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback

Aligning large language models (LLMs) with human preferences has proven effective for enhancing model capabilities, yet standard preference modeling using the Bradley-Terry model assumes transitivity, overlooking the inherent complexity of human population preferences. Nash learning from human feedback (NLHF) addresses this by framing non-transitive preferences as a two-player zero-sum game, where alignment reduces to finding the Nash equilibrium (NE). However, existing algorithms typically rely on regularization, incurring unavoidable bias when computing the duality gap in the original game. In this work, we provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($\mathtt{OMWU}$) in NLHF, showing that it achieves last-iterate linear convergence after a burn-in phase whenever an NE with full support exists, with an instance-dependent linear convergence rate to the original NE, measured by duality gaps. Compared to prior results in Wei et al. (2020), we do not require the assumption of NE uniqueness. Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values, enabling exponentially better dependence on instance-dependent constants than prior results. Experiments corroborate the theoretical strengths of $\mathtt{OMWU}$ in both tabular and neural policy classes, demonstrating its potential for LLM applications.

preprint2026arXiv

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

Bring-Your-Own-Key (BYOK) agent architectures let users route LLM traffic through third-party relays, creating a critical integrity gap: a malicious relay can modify an aligned LLM response after generation but before agent execution. We formalize this post-alignment tampering threat and show that, without end-to-end integrity, the relay can observe, suppress, or replace downstream messages, making even perfectly aligned LLMs ineffective against such attacks. We instantiate this threat as the Relay Tampering Attack (RTA), which performs multi-round strategic rewriting, minimal security-critical edits, and stealth restoration by resubmitting tampered outputs to the upstream LLM. Across AgentDojo and ASB with six LLMs, RTA achieves up to 99.1% attack success, outperforming prompt-injection baselines with modest overhead. Case studies on OpenClaw and Claude Code demonstrate real-world feasibility, and evaluations of four defenses show that none fully prevent RTA. Finally, we propose a time-based detection defense that mitigates RTA while preserving agent utility.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2024arXiv

BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems. To deal with the problem, we propose a band-split packet loss concealment network (BS-PLCNet). Specifically, we split the full-band signal into wide-band (0-8kHz) and high-band (8-24kHz). The wide-band signals are processed by a gated convolutional recurrent network (GCRN), while the high-band counterpart is processed by a simple GRU network. To ensure high speech quality and automatic speech recognition (ASR) compatibility, multi-task learning (MTL) framework including fundamental frequency (f0) prediction, linguistic awareness, and multi-discriminators are used. The proposed approach tied for 1st place in the ICASSP 2024 PLC Challenge.

preprint2024arXiv

SELM: Speech Enhancement Using Discrete Tokens and Language Models

Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, harnessing semantic information holds potential advantages for speech enhancement tasks. In light of this, we propose SELM, a novel paradigm for speech enhancement, which integrates discrete tokens and leverages language models. SELM comprises three stages: encoding, modeling, and decoding. We transform continuous waveform signals into discrete tokens using pre-trained self-supervised learning (SSL) models and a k-means tokenizer. Language models then capture comprehensive contextual information within these tokens. Finally, a detokenizer and HiFi-GAN restore them into enhanced speech. Experimental results demonstrate that SELM achieves comparable performance in objective metrics alongside superior results in subjective perception. Our demos are available https://honee-w.github.io/SELM/.

preprint2022arXiv

A New Metric on Symmetric Group and Applications to Block Permutation Codes

Permutation codes have received a great attention due to various applications. For different applications, one needs permutation codes under different metrics. The generalized Cayley metric was introduced by Chee and Vu [4] and this metric includes several other metrics as special cases. However, the generalized Cayley metric is not easily computable in general. Therefore the block permutation metric was introduced by Yang et al. [22] as the generalized Cayley metric and the block permutation metric have the same magnitude. However, the block permutation metric lacks the symmetry property which restricts more advanced algebraic tools to be involved. In this paper, by introducing a novel metric closely related to the block permutation metric, we build a bridge between some advanced algebraic methods and codes in the block permutation metric. More specifically, based on some techniques from algebraic function fields originated in [19], we give an algebraic-geometric construction of codes in the novel metric with reasonably good parameters. By observing a trivial relation between the novel metric and block permutation metric, we then produce non-systematic codes in block permutation metric that improve all known results given in [21, 22]. More importantly, based on our non-systematic codes, we provide an explicit and systematic construction of codes in block permutation metric which improves the systematic result shown in [22]. In the end, we demonstrate that our codes in the novel metric itself have reasonably good parameters by showing that our construction beats the corresponding Gilbert-Varshamov bound.

preprint2022arXiv

A Pilot Study of Relating MYCN-Gene Amplification with Neuroblastoma-Patient CT Scans

Neuroblastoma is one of the most common cancers in infants, and the initial diagnosis of this disease is difficult. At present, the MYCN gene amplification (MNA) status is detected by invasive pathological examination of tumor samples. This is time-consuming and may have a hidden impact on children. To handle this problem, we adopt multiple machine learning (ML) algorithms to predict the presence or absence of MYCN gene amplification. The dataset is composed of retrospective CT images of 23 neuroblastoma patients. Different from previous work, we develop the algorithm without manually-segmented primary tumors which is time-consuming and not practical. Instead, we only need the coordinate of the center point and the number of tumor slices given by a subspecialty-trained pediatric radiologist. Specifically, CNN-based method uses pre-trained convolutional neural network, and radiomics-based method extracts radiomics features. Our results show that CNN-based method outperforms the radiomics-based method.

preprint2022arXiv

Back-ends Selection for Deep Speaker Embeddings

Probabilistic Linear Discriminant Analysis (PLDA) was the dominant and necessary back-end for early speaker recognition approaches, like i-vector and x-vector. However, with the development of neural networks and margin-based loss functions, we can obtain deep speaker embeddings (DSEs), which have advantages of increased inter-class separation and smaller intra-class distances. In this case, PLDA seems unnecessary or even counterproductive for the discriminative embeddings, and cosine similarity scoring (Cos) achieves better performance than PLDA in some situations. Motivated by this, in this paper, we systematically explore how to select back-ends (Cos or PLDA) for deep speaker embeddings to achieve better performance in different situations. By analyzing PLDA and the properties of DSEs extracted from models with different numbers of segment-level layers, we make the conjecture that Cos is better in same-domain situations and PLDA is better in cross-domain situations. We conduct experiments on VoxCeleb and NIST SRE datasets in four application situations, single-/multi-domain training and same-/cross-domain test, to validate our conjecture and briefly explain why back-ends adaption algorithms work.

preprint2022arXiv

GANimator: Neural Motion Synthesis from a Single Sequence

We present GANimator, a generative model that learns to synthesize novel motions from a single, short motion sequence. GANimator generates motions that resemble the core elements of the original motion, while simultaneously synthesizing novel and diverse movements. Existing data-driven techniques for motion synthesis require a large motion dataset which contains the desired and specific skeletal structure. By contrast, GANimator only requires training on a single motion sequence, enabling novel motion synthesis for a variety of skeletal structures e.g., bipeds, quadropeds, hexapeds, and more. Our framework contains a series of generative and adversarial neural networks, each responsible for generating motions in a specific frame rate. The framework progressively learns to synthesize motion from random noise, enabling hierarchical control over the generated motion content across varying levels of detail. We show a number of applications, including crowd simulation, key-frame editing, style transfer, and interactive control, which all learn from a single input sequence. Code and data for this paper are at https://peizhuoli.github.io/ganimator.

preprint2022arXiv

Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies

This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) that enjoys a regret bound \emph{independent on the planning horizon}. Specifically, we consider tabular MDP with $S$ states, $A$ actions, a planning horizon $H$, total reward bounded by $1$, and the agent plays for $K$ episodes. We design an algorithm that achieves an $O\left(\mathrm{poly}(S,A,\log K)\sqrt{K}\right)$ regret in contrast to existing bounds which either has an additional $\mathrm{polylog}(H)$ dependency~\citep{zhang2020reinforcement} or has an exponential dependency on $S$~\citep{li2021settling}. Our result relies on a sequence of new structural lemmas establishing the approximation power, stability, and concentration property of stationary policies, which can have applications in other problems related to Markov chains.

preprint2022arXiv

Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures

From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning approaches require large quantities of training data to produce high-performance models and have difficulty generalizing beyond training data given the complexity of the design space. Unsupervised and reinforcement learning-based approaches on the other hand can have very lengthy training or optimization times associated with them. Here we demonstrate a hybrid supervised learning and reinforcement learning approach to the inverse design of nanophotonic structures and show this approach can reduce training data dependence, improve the generalizability of model predictions, and shorten exploratory training times by orders of magnitude. The presented strategy thus addresses a number of contemporary deep learning-based challenges, while opening the door for new design methodologies that leverage multiple classes of machine learning algorithms to produce more effective and practical solutions for photonic design.

preprint2022arXiv

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Recent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics. However, with high-quality contextualized document representations, do we really need sophisticated neural models to obtain coherent and interpretable topics? In this paper, we conduct thorough experiments showing that directly clustering high-quality sentence embeddings with an appropriate word selecting method can generate more coherent and diverse topics than NTMs, achieving also higher efficiency and simplicity.

preprint2022arXiv

Long-Tailed Classification with Gradual Balanced Loss and Adaptive Feature Generation

The real-world data distribution is essentially long-tailed, which poses great challenge to the deep model. In this work, we propose a new method, Gradual Balanced Loss and Adaptive Feature Generator (GLAG) to alleviate imbalance. GLAG first learns a balanced and robust feature model with Gradual Balanced Loss, then fixes the feature model and augments the under-represented tail classes on the feature level with the knowledge from well-represented head classes. And the generated samples are mixed up with real training samples during training epochs. Gradual Balanced Loss is a general loss and it can combine with different decoupled training methods to improve the original performance. State-of-the-art results have been achieved on long-tail datasets such as CIFAR100-LT, ImageNetLT, and iNaturalist, which demonstrates the effectiveness of GLAG for long-tailed visual recognition.

preprint2022arXiv

Room-temperature superconductivity in heavy rare earth metal substituted sodalite-like clathrate hexahydrides under moderate pressure

Room temperature superconductivity is a dream that mankind has been chasing for a century. In recent years, the synthesis of H3S, LaH10 and C-S-H system has gradually made this dream a reality. But the extreme pressures required for the metallization of hydrogen-based superconductors limit their applications. In this work, we design a series of high temperature superconductors that can be stable at moderate pressures by incorporating heavy rare earth elements Yb/Lu into sodalite-like clathrate hydrides. In particular, the critical temperatures of Y3LuH24, YLuH12 and YLu3H24 are 283 K, 275 K and 288 K, respectively, which are close to or have reached room temperature, and the required pressure for stabilization of these hydrides is about 120 GPa which is significantly lower than that of reported room temperature superconductors. Our work provides an effective method for the rational design of low-pressure stabilized hydrogen-based superconductors with high-Tc and will stimulate further experimental exploration.

preprint2022arXiv

The HCCL System for the NIST SRE21

This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21).We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce several techniques to reduce the cross-domain mismatch. Specifically, Codec and speech enhancement are directly applied to the raw speech to eliminate the codecs and the environment noise mismatch. We denote the methods that work directly on speech to eliminate the relatively explicit mismatches collectively as data adaptation methods. Experiments show that data adaption methods achieve 15\% improvements over our baseline. Furthermore, some popular back-ends domain adaptation algorithms are deployed on speaker embeddings to alleviate speaker performance degradation caused by the implicit mismatch. Score calibration is a major failure for us in SRE21. The reason is that score calibration with too many parameters easily lead to overfitting problems.

preprint2021arXiv

Design Principles for High Temperature Superconductors with Hydrogen-based Alloy Backbone at Moderate Pressure

Hydrogen-based superconductors provide a route to the long-sought goal of room-temperature superconductivity, but the high pressures required to metallize these materials limit their immediate application. For example, carbonaceous sulfur hydride, the first room-temperature superconductor, can reach a critical temperature (Tc) of 288 K only at the extreme pressure of 267 GPa. The next recognized challenge is the realization of room-temperature superconductivity at significantly lower pressures. Here, we propose a strategy for the rational design of high-temperature superconductors at low pressures by alloying small-radius elements and hydrogen to form ternary hydride superconductors with alloy backbones. We identify a hitherto unknown fluorite-type backbone in compositions of the form AXH8, which exhibit high temperature superconductivity at moderate pressures. The Fm-3m phase of LaBeH8, with a fluorite-type H-Be alloy backbone, is predicted to be metastable and superconducting with a Tc ~ 191 K at 50 GPa; a substantially lower pressure than that required by the geometrically similar clathrate hydride LaH10 (170 GPa). Our approach paves the way for finding high-Tc ternary hydride superconductors at conditions close to ambient pressures.

preprint2020arXiv

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

We study the reinforcement learning problem in the setting of finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and episode length $H$. We propose a model-free algorithm UCB-Advantage and prove that it achieves $\tilde{O}(\sqrt{H^2SAT})$ regret where $T = KH$ and $K$ is the number of episodes to play. Our regret bound improves upon the results of [Jin et al., 2018] and matches the best known model-based algorithms as well as the information theoretic lower bound up to logarithmic factors. We also show that UCB-Advantage achieves low local switching cost and applies to concurrent reinforcement learning, improving upon the recent results of [Bai et al., 2019].

preprint2019arXiv

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function $h^{*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SAHT})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored. } for MDP with $S$ states and $A$ actions, in the case that an upper bound $H$ on the span of $h^{*}$, i.e., $sp(h^{*})$ is known. This result outperforms the best previous regret bounds $\tilde{O}(S\sqrt{AHT}) $\citep{fruit2019improved} by a factor of $\sqrt{S}$. Furthermore, this regret bound matches the lower bound of $Ω(\sqrt{SAHT}) $\citep{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of $\tilde{O}(\sqrt{SADT})$ for MDPs with a finite diameter $D$ compared to the lower bound of $Ω(\sqrt{SADT}) $\citep{jaksch2010near}.