Researcher profile

Changhao Li

Changhao Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Discrimination Is Generation: Unifying Ranking and Retrieval from a Tokenizer Perspective

Semantic IDs (SIDs) define the generation space of generative recommendation and directly determine its personalization ceiling. However, existing tokenizers are trained independently with retrieval objectives, leaving personalization signals fully decoupled from the SID construction process -- a fundamental gap that causes generative retrieval to persistently lag behind discriminative ranking. In this paper, we rethink the essence of SIDs: \emph{ranking seeks argmax in item space while retrieval seeks argmax in token space; both are the same problem solved at different granularities.} Based on this insight, we propose \DIG (\textbf{D}iscrimination \textbf{I}s \textbf{G}eneration), which embeds the tokenizer inside a discriminative ranking model for end-to-end training -- the ranker naturally becomes a retrieval model, yielding two models from a single training run. \DIG is organized around a \emph{feature assignment taxonomy}: item-intrinsic static features are encoded into SIDs, user-item cross features (u2i) implicitly drive codebook boundaries toward recommendation decision boundaries during training, and an MLP$_\mathrm{u2t}$ distillation module approximates u2i at the token level for inference. Experiments on three public benchmarks and two industrial datasets demonstrate that \DIG simultaneously improves ranking, retrieval, and unified retrieval-ranking quality.

preprint2026arXiv

Exploration-Driven Optimization for Test-Time Large Language Model Reasoning

Post-training techniques combined with inference-time scaling significantly enhance the reasoning and alignment capabilities of large language models (LLMs). However, a fundamental tension arises: inference-time methods benefit from diverse sampling from a relatively flattened probability distribution, whereas reinforcement learning (RL)-based post-training inherently sharpens these distributions. To address this, we propose Exploration-Driven Optimization (EDO), which extends reward-biasing style exploration objectives to iterative post-training and integrates them into standard RL objectives, encouraging greater diversity in sampled solutions while facilitating more effective inference-time computation. We incorporate EDO into iterative Direct Preference Optimization (iDPO) and Group Relative Policy Optimization (GRPO), resulting in two variants: ED-iDPO and ED-GRPO. Extensive experiments demonstrate that both ED-iDPO and ED-GRPO exhibit greater solution diversity and improved reasoning abilities, particularly when combined with test-time computation techniques like self-consistency. Across three in-distribution reasoning benchmarks, EDO achieves a 1.0-1.3\% improvement over the strongest baselines, and delivers an additional 1.5\% average gain on five out-of-distribution tasks. Beyond accuracy, EDO preserves model entropy and stabilizes RL training dynamics, highlighting its effectiveness in preventing over-optimization collapse. Taken together, these results establish EDO as a practical framework for balancing exploration and exploitation in LLM reasoning, especially in settings that rely on test-time scaling.

preprint2026arXiv

Q-CHOP: Quantum constrained Hamiltonian optimization

Combinatorial optimization problems that arise in science and industry typically have constraints. Yet the presence of constraints makes them challenging to tackle using both classical and quantum optimization algorithms. We propose a new quantum algorithm for constrained optimization, which we call quantum constrained Hamiltonian optimization (Q-CHOP). Our algorithm leverages the observation that for many problems, while the best solution is difficult to find, the worst feasible (constraint-satisfying) solution is known. The basic idea of Q-CHOP is to enforce a Hamiltonian constraint at all times, thereby restricting evolution to the subspace of feasible states, and slowly ``rotate'' an objective Hamiltonian to trace an adiabatic path from the worst feasible state to the best feasible state. Q-CHOP thereby assigns qualitatively distinct roles to the constraint and objective functions of a constrained optimization problem. We additionally propose a version of Q-CHOP that can start in any feasible state. Finally, we benchmark Q-CHOP against the commonly-used adiabatic algorithm of quantum annealing with an objective function that penalizes constraint violation, and find that Q-CHOP consistently performs significantly better on a wide range of problems, including textbook graph problems, knapsack problems, combinatorial auctions, and a real-world financial use case of bond exchange-traded fund basket optimization.

preprint2026arXiv

Revisiting DAgger in the Era of LLM-Agents

Long-horizon LM agents learn from multi-turn interaction, where a single early mistake can alter the subsequent state distribution and derail the whole trajectory. Existing recipes fall short in complementary ways: supervised fine-tuning provides dense teacher supervision but suffers from covariate shift because it is trained on off-policy teacher trajectories; while reinforcement learning with verifiable rewards avoids this off-policy mismatch by learning from on-policy rollouts but with only sparse outcome feedback. We address this dilemma by revisiting Dataset Aggregation (DAgger) for multi-turn LM agents: the algorithm collects trajectories through a turn-level interpolation of student and teacher policies, and the student is then trained on these trajectories using supervised labels provided by the teacher. By directly interacting with environments, we expose the model to realistic states likely to be encountered during deployment, thereby effectively mitigating covariate shift. Besides, since the student is learned by mimicking the teacher's behavior, it receives rich feedback during learning. To demonstrate DAgger enjoys the benefits of both worlds, we tested the algorithm to train a software-engineering agent with 4B- and 8B-scale student models. On SWE-bench Verified, our DAgger-style training improves over the strongest post-training baseline by +3.9 points at 4B and +3.6 points at 8B. The resulting 4B agent reaches 27.3%, outperforming representative published 8B SWE-agent systems, while the 8B agent achieves 29.8%, surpassing SWE-Gym-32B and coming within 5 points of stronger 32B-scale agents. Together with consistent gains on the held-out SWE-Gym split, these results suggest the effectiveness of DAgger for modern long-horizon LM agents.

preprint2023arXiv

Ion sensors with crown ether-functionalized nanodiamonds

Alkali metal ions such as sodium and potassium cations play fundamental roles in biology. Developing highly sensitive and selective methods to both detect and quantify these ions is of considerable importance for medical diagnostics and bioimaging. Fluorescent nanoparticles have emerged as powerful tools for nanoscale imaging, but their optical properties need to be supplemented with specificity to particular chemical and biological signals in order to provide further information about biological processes. Nitrogen-vacancy (NV) centers in diamond are particularly attractive as fluorescence markers, thanks to their optical stability, biocompatibility and further ability to serve as highly sensitive quantum sensors of temperature, magnetic and electric fields in ambient conditions. In this work, by covalently grafting crown ether structures on the surface of nanodiamonds (NDs), we build sensors that are capable of detecting specific alkali ions such as sodium cations. We will show that the presence of these metal ions modifies the charge state of NV centers inside the ND, which can then be read out by measuring their photoluminescence spectrum. Our work paves the way for designing selective biosensors based on NV centers in diamond.

preprint2021arXiv

SARS-CoV-2 quantum sensor based on nitrogen-vacancy centers in diamond

The development of highly sensitive and rapid biosensing tools targeted to the highly contagious virus SARS-CoV-2 is critical to tackling the COVID-19 pandemic. Quantum sensors can play an important role, thanks to their superior sensitivity and fast improvements in recent years. Here we propose a molecular transducer designed for nitrogen-vacancy (NV) centers in nanodiamonds, translating the presence of SARS-CoV-2 RNA into an unambiguous magnetic noise signal that can be optically read out. We evaluate the performance of the hybrid sensor, including its sensitivity and false negative rate, and compare it to widespread diagnostic methods. The proposed method is fast and promises to reach a sensitivity down to a few hundreds of RNA copies with false negative rate less than 1%. The proposed hybrid sensor can be further implemented with different solid-state defects and substrates, generalized to diagnose other RNA viruses, and integrated with CRISPR technology.

preprint2020arXiv

Effective routing design for remote entanglement generation on quantum networks

Quantum network is a promising platform for many ground-breaking applications that lie beyond the capability of its classical counterparts. Efficient entanglement generation on quantum networks with relatively limited resources such as quantum memories is essential to fully realize the network's capabilities, the solution to which calls for delicate network design and is currently at the primitive stage. In this study we propose an effective routing scheme to enable automatic responses for multiple requests of entanglement generation between source-terminal stations on a quantum lattice network with finite edge capacities. Multiple connection paths are exploited for each connection request while entanglement fidelity is ensured for each path by performing entanglement purification. The routing scheme is highly modularized with a flexible nature, embedding quantum operations within the algorithmic workflow, whose performance is evaluated from multiple perspectives. In particular, three algorithms are proposed and compared for the scheduling of capacity allocation on the edges of quantum network. Embodying the ideas of proportional share and progressive filling that have been well-studied in classical routing problems, we design a new scheduling algorithm, the propagatory update method, which in certain aspects overrides the two algorithms based on classical heuristics in scheduling performances. The general solution scheme paves the road for effective design of efficient routing and flow control protocols on applicational quantum networks.