Source author record

Duo Zhang

Duo Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning physics.comp-ph quant-ph Artificial Intelligence cond-mat.mtrl-sci Databases eess.AS Information Theory math.IT physics.chem-ph Sound

Catalog footprint

What is connected

8works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models

Crystal generative models mainly learn what stable crystals look like, with little explicit supervision for what makes them stable. We reveal a substantial representation gap between state-of-the-art crystal generative models and pretrained universal machine learning interatomic potentials (MLIPs) via energy probing, and show this gap can be closed by a simple training-time alignment. We propose Crystal REPresentation Alignment (CrystalREPA), a plug-and-play framework that aligns the atom-wise hidden states of generative encoders with frozen MLIP representations through an element-aware contrastive objective, transferring stability-aware atomistic priors with marginal training overhead and no additional inference cost. Across three generative frameworks, ten MLIP teachers, and two benchmark datasets, CrystalREPA consistently improves the thermodynamic stability, structural validity, and structural fidelity of generated crystals. Equally important, we find that an MLIP's transfer effectiveness is poorly predicted by its accuracy on standard leaderboards (e.g., Matbench Discovery) but strongly predicted by the distinguishability of its atom-wise representation space, yielding a practical, accuracy-independent criterion for selecting MLIP teachers for generative transfer.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2026arXiv

Multi-Task Fine-Tuning Enables Robust Out-of-Distribution Generalization in Atomistic Models

Accurate de novo molecular and materials design requires structure-property models that generalize beyond known regimes. Although pretrained atomistic models achieve strong in-distribution accuracy after fine-tuning, their reliability under out-of-distribution (OOD) conditions remains unclear. We identify a critical failure mode in downstream adaptation: standard fine-tuning induces representation collapse, erasing pretrained chemical and structural priors and severely degrading OOD performance. To address this limitation, we propose multi-task fine-tuning (MFT), which jointly optimizes downstream property prediction with a physically grounded force-field objective inherited from pretraining. This approach preserves essential chemical priors while enabling task-specific adaptation. Across molecular and materials benchmarks, MFT consistently improves OOD generalization, approaching the theoretical limit set by in-distribution accuracy, while outperforming standard fine-tuning, training from scratch, and state-of-the-art task-specific models. These results establish safe adaptation as a central requirement for large atomistic models and position MFT as a practical and data-efficient pathway toward robust molecular and materials discovery.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2016arXiv

Transmit design for MIMO wiretap channel with a malicious jammer

In this paper, we consider the transmit design for multi-input multi-output (MIMO) wiretap channel including a malicious jammer. We first transform the system model into the traditional three-node wiretap channel by whitening the interference at the legitimate user. Additionally, the eavesdropper channel state information (ECSI) may be fully or statistically known, even unknown to the transmitter. Hence, some strategies are proposed in terms of different levels of ECSI available to the transmitter in our paper. For the case of unknown ECSI, a target rate for the legitimate user is first specified. And then an inverse water-filling algorithm is put forward to find the optimal power allocation for each information symbol, with a stepwise search being used to adjust the spatial dimension allocated to artificial noise (AN) such that the target rate is achievable. As for the case of statistical ECSI, several simulated channels are randomly generated according to the distribution of ECSI. We show that the ergodic secrecy capacity can be approximated as the average secrecy capacity of these simulated channels. Through maximizing this average secrecy capacity, we can obtain a feasible power and spatial dimension allocation scheme by using one dimension search. Finally, numerical results reveal the effectiveness and computational efficiency of our algorithms.

preprint2014arXiv

Global quantum discord in infinite quantum spin chains

In this paper, we study global quantum discord (GQD) in infinite-size spin chains. For this purpose, in the framework of matrix product states (MPSs), we propose an effective procedure to calculate GQD (denoted as Gn) for consecutive n-site subchains in infinite chains. For a spin-1/2 three-body interaction model, whose ground state can be exactly expressed as MPSs, We use the procedure to study Gn with n up to $24$. Then for a spin-1/2 XXZ chain, we firstly use infinite time-evolving block decimation (iTEBD) algorithm to obtain the approximate wavefunction in the from of MPSs, and then figure out Gn with n up to $18$. In both models, Gn shows an interesting linear growth as the increase of n, that is, Gn = k*n+b. Moreover, in non-critical regions the slope $k$ of Gn converges very fast, while in critical regions it converges relatively slow, and the behaviors are explained in a clear physical picture with the short-range and long-range correlations. Based on these results, we propose to use Gn/n to describe the global correlations in infinite chains. Gn/n has twofold physical meanings. Firstly, it can be regarded as "global discord per site", very similar to "energy per site" or "magnetization per site" in quantum magnetic systems. Secondly, Gn/n (when n is large enough) describes the quantum correlation between a single site and an (n-1)-site block. Then we successfully apply our theory to an exactly soluble infinite-size spin XY chain which is beyond the matrix product formula, and the Hamiltonian can reduce to the transverse-field Ising model and the XX model. The relation between GQD and quantum phase transitions in these models is discussed.

preprint2014arXiv

Multi-partite quantum nonlocality and Bell-type inequalities in an infinite-order quantum phase transition of the one-dimensional spin-1/2 XXZ chain

In this paper, combined with infinite time-evolving block decimation (iTEBD) algorithm and Bell-type inequalities, we investigate multi-partite quantum nonlocality in an infinite one-dimensional quantum spin-1/2 XXZ system. High hierarchy of multipartite nonlocality can be observed in the gapless phase of the model, meanwhile only the lowest hierarchy of multipartite nonlocality is observed in most regions of the gapped anti-ferromagnetic phase. Thereby, Bell-type inequalities disclose different correlation structures in the two phases of the system. Furthermore, at the infinite-order QPT (or Kosterlitz-Thouless QPT) point of the model, the correlation measures always show a local minimum value, regardless of the length of the subchains. It indicates that relatively low hierarchy of multi-partite nonlocality would be observed at the infinite-order QPT point in a Bell-type experiment. The result is in contrast to the existing results of the second-order QPT in the one-dimensional XY model, where multi-partite nonlocality with the hierarchy has been observed. Thus, multi-partite nonlocality provides us an alternative perspective to distinguish between these two kinds of QPTs. Reliable clues for the existence of tripartite quantum entanglement have also been found.

preprint2014arXiv

Principled Graph Matching Algorithms for Integrating Multiple Data Sources

This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. Entity resolution-the data integration problem of performing noisy joins on structured data-typically proceeds by first hashing each record into zero or more blocks, scoring pairs of records that are co-blocked for similarity, and then matching pairs of sufficient similarity. In the most common case of matching two sources, it is often desirable for the final matching to be one-to-one (a record may be matched with at most one other); members of the database and statistical record linkage communities accomplish such matchings in the final stage by weighted bipartite graph matching on similarity scores. Such matchings are intuitively appealing: they leverage a natural global property of many real-world entity stores-that of being nearly deduped-and are known to provide significant improvements to precision and recall. Unfortunately unlike the bipartite case, exact max-weight matching on multi-partite graphs is known to be NP-hard. Our two-fold algorithmic contributions approximate multi-partite max-weight matching: our first algorithm borrows optimization techniques common to Bayesian probabilistic inference; our second is a greedy approximation algorithm. In addition to a theoretical guarantee on the latter, we present comparisons on a real-world ER problem from Bing significantly larger than typically found in the literature, publication data, and on a series of synthetic problems. Our results quantify significant improvements due to exploiting multiple sources, which are made possible by global one-to-one constraints linking otherwise independent matching sub-problems. We also discover that our algorithms are complementary: one being much more robust under noise, and the other being simple to implement and very fast to run.

Duo Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models

MiMo-V2-Flash Technical Report

Multi-Task Fine-Tuning Enables Robust Out-of-Distribution Generalization in Atomistic Models

MiMo-Audio: Audio Language Models are Few-Shot Learners

Transmit design for MIMO wiretap channel with a malicious jammer

Global quantum discord in infinite quantum spin chains

Multi-partite quantum nonlocality and Bell-type inequalities in an infinite-order quantum phase transition of the one-dimensional spin-1/2 XXZ chain

Principled Graph Matching Algorithms for Integrating Multiple Data Sources