Researcher profile

Teng Xiao

Teng Xiao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

Language models encode substantial evaluative knowledge from pretraining, yet current post-training methods rely on external supervision (human annotations, proprietary models, or scalar reward models) to produce reward signals. Each imposes a ceiling. Human judgment cannot supervise capabilities beyond its own, proprietary APIs create dependencies, and verifiable rewards cover only domains with ground-truth answers. Self-improvement from a model's own evaluative capacity is a reward source that scales with the model itself, yet remains largely untapped by current methods. We introduce EVOLM, a post-training method that structures this capacity into explicit discriminative rubrics and uses them as training signal. EVOLM trains two capabilities within a single language model in alternation: (1) a rubric generator producing instance-specific evaluation criteria optimized for discriminative utility, which maximizes a small frozen judge's ability to distinguish preferred from dispreferred responses; and (2) a policy trained using those rubric-conditioned scores as reward. All preference signals are constructed from the policy's own outputs via temporal contrast with earlier checkpoints, requiring no human annotation or external supervision. EVOLM trains a Qwen3-8B model to generate rubrics that outperform GPT-4.1 on RewardBench-2 by 25.7%. The co-trained policy achieves 69.3% average on the OLMo3-Adapt suite, outperforming policies trained with GPT-4.1 prompted rubrics by 3.9% and with the state-of-the-art 8B reward model SkyWork-RM by 16%. Overall, EVOLM demonstrates that structuring a model's evaluative capacity into co-evolving discriminative rubrics enables self-improvement without external supervision.

preprint2026arXiv

Reinforcement Learning for Tool-Integrated Interleaved Thinking towards Cross-Domain Generalization

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains a significant challenge. Standard paradigms often treat tool usage as a linear or isolated event, which becomes brittle when transferring skills from restricted domains (e.g., mathematics) to open-ended tasks. In this work, we investigate the cross-domain generalization of an LLM agent trained exclusively on mathematical problem-solving. To facilitate robust skill transfer, we propose a {\textbf{R}einforcement Learning for \textbf{I}nterleaved \textbf{T}ool \textbf{E}xecution (RITE)}. Unlike traditional methods, RITE enforces a continuous ``Plan-Action-Reflection'' cycle, allowing the model to ground its reasoning in intermediate tool outputs and self-correct during long-horizon tasks. To effectively train this complex interleaved policy, we introduce {Dr. GRPO}, a robust optimization objective that utilizes token-level loss aggregation with importance sampling to mitigate reward sparsity and high-variance credit assignment. Furthermore, we employ a dual-component reward system and dynamic curriculum via online rollout filtering to ensure structural integrity and sample efficiency. Extensive experiments reveal that our approach, despite being trained solely on math tasks, achieves state-of-the-art performance across diverse reasoning domains, demonstrating high token efficiency and strong generalization capabilities.

preprint2020arXiv

Periodic driving induced helical Floquet channels with ultracold atoms in momentum space

Employing the external degrees of freedom of atoms as synthetic dimensions renders easy and new accesses to quantum engineering and quantum simulation. As a recent development, ultracold atoms suffering from two-photon Bragg transitions can be diffracted into a series of discrete momentum states to form a momentum lattice. Here we provide a detailed analysis on such a system, and, as a concrete example, report the observation of robust helical Floquet channels, by introducing periodic driving sequences. The robustness of these channels against perturbations is confirmed, as a test for their topological origin captured by Floquet winding numbers. The periodic switching demonstrated here serves as a testbed for more complicated Floquet engieering schemes, and offers exciting opportunities to study novel topological physics in a many-body setting with tunable interactions.

preprint2020arXiv

Topological quantum walks in momentum space with a Bose-Einstein condensate

We report the experimental implementation of discrete-time topological quantum walks of a Bose-Einstein condensate in momentum space. Introducing stroboscopic driving sequences to the generation of a momentum lattice, we show that the dynamics of atoms along the lattice is effectively governed by a periodically driven Su-Schrieffer-Heeger model, which is equivalent to a discrete-time topological quantum walk. We directly measure the underlying topological invariants through time-averaged mean chiral displacements, which are consistent with our experimental observation of topological phase transitions. We then observe interaction-induced localization in the quantum-walk dynamics, where atoms tend to populate a single momentum-lattice site under interactions that are non-local in momentum space. Our experiment opens up the avenue of investigating discrete-time topological quantum walks using cold atoms, where the many-body environment and tunable interactions offer exciting new possibilities.

preprint2020arXiv

Tunable non-reciprocal quantum transport through a dissipative Aharonov-Bohm ring in ultracold atoms

We report the experimental observation of tunable, non-reciprocal quantum transport of a Bose-Einstein condensate in a momentum lattice. By implementing a dissipative Aharonov-Bohm (AB) ring in momentum space and sending atoms through it, we demonstrate a directional atom flow by measuring the momentum distribution of the condensate at different times. While the dissipative AB ring is characterized by the synthetic magnetic flux through the ring and the laser-induced loss on it, both the propagation direction and transport rate of the atom flow sensitively depend on these highly tunable parameters. We demonstrate that the non-reciprocity originates from the interplay of the synthetic magnetic flux and the laser-induced loss, which simultaneously breaks the inversion and the time-reversal symmetries. Our results open up the avenue for investigating non-reciprocal dynamics in cold atoms, and highlight the dissipative AB ring as a flexible building element for applications in quantum simulation and quantum information.