Researcher profile

Kai Tang

Kai Tang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

expo: Exploration-prioritized policy optimization via adaptive kl regulation and gaussian curriculum sampling

Reinforcement Learning with Verifiable Rewards (RLVR) has become the standard paradigm for LLM mathematical reasoning, where Group Relative Policy Optimization (GRPO) serves as the mainstream algorithm. We point out two understudied inefficiencies existing in GRPO. First, the fixed KL penalty coefficient overly restricts policy exploration at stages where the model requires significant deviation from the reference policy. Second, uniform sampling of training questions ignores that moderately difficult problems provide the most informative gradient signals for optimization. We propose Exploration-Prioritized Policy Optimization (EXPO) with two lightweight plug-in modules. The Accuracy-Conditioned KL Scaling (AKL) dynamically adjusts KL regularization strength through a smooth nonlinear function of batch average accuracy, relaxing the penalty when the model underperforms and strengthening it when the model achieves good results. The Gaussian Curriculum Sampling (GCS) assigns sampling weights to questions following a Gaussian distribution centered at moderate accuracy around 0.5, focusing training on the model's learning frontier. We conduct extensive experiments on DeepSeek-R1-Distill-Qwen-1.5B and Qwen3-8B-Base over six mathematical reasoning benchmarks. The results show EXPO steadily surpasses vanilla GRPO. It obtains an absolute gain of 13.34 on AIME 2025 pass@32, rising from 63.33 percent to 76.67 percent, and achieves an average pass@32 improvement of 2.66 on the 8B model. The much larger performance gains on pass@32 compared with pass@1 demonstrate that EXPO effectively enlarges the model's exploration boundary under a fixed inference cost budget.

preprint2026arXiv

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

Reinforcement Learning with Verifiable Rewards (RLVR) has become the standard paradigm for LLM mathematical reasoning, with Group Relative Policy Optimization (GRPO) serving as the dominant algorithm. We identify two overlooked inefficiencies inherent in GRPO. First, a fixed KL coefficient overly restricts policy exploration at moments when the model needs to diverge significantly from the reference policy. Second, uniform question sampling overlooks that moderately difficult problems produce the most informative gradient signals. We propose FG-ExPO, short for Frontier-Guided Exploration-Prioritized Policy Optimization, which integrates two lightweight components. Accuracy-Conditioned KL Scaling (AKL) adjusts the KL penalty strength through a smooth nonlinear function of batch average accuracy, loosening the constraint when the model performs poorly and strengthening it when the model achieves satisfactory results. Gaussian Curriculum Sampling (GCS) assigns sampling weights to questions following a Gaussian distribution centered at a moderate accuracy level around 0.5, focusing model training on its learning frontier. We conduct evaluations on DeepSeek-R1-Distill-Qwen-1.5B and Qwen3-8B-Base across six mainstream mathematical reasoning benchmarks. Experimental results demonstrate that FG-ExPO consistently outperforms vanilla GRPO. It delivers an absolute improvement of 13.34 on the AIME 2025 pass@32 metric, rising from 63.33 percent to 76.67 percent, and obtains an average pass@32 gain of 2.66 on the 8B model. The substantially larger performance gains observed on pass@32 compared to pass@1 verify that FG-ExPO enlarges the model's effective exploration space under a fixed inference budget.

preprint2026arXiv

Learning from Prompt itself: the Hierarchical Attribution Prompt Optimization

Optimization is fundamental across numerous disciplines, typically following an iterative process of refining an initial solution to enhance performance. This principle is equally critical in prompt engineering, where designing effective prompts for large language models constitutes a complex optimization challenge. A structured optimization approach requires automated or semi-automated procedures to develop improved prompts, thereby reducing manual effort, improving performance, and yielding an interpretable process. However, current prompt optimization methods often induce prompt drift, where new prompts fix prior failures but impair performance on previously successful tasks. Additionally, generating prompts from scratch can compromise interpretability. To address these limitations, this study proposes the Hierarchical Attribution Prompt Optimization (HAPO) framework, which introduces three innovations: (1) a dynamic attribution mechanism targeting error patterns in training data and prompting history, (2) semantic-unit optimization for editing functional prompt segments, and (3) multimodal-friendly progression supporting both end-to-end LLM and LLM-MLLM workflows. Applied in contexts like single/multi-image QA (e.g., OCRV2) and complex task analysis (e.g., BBH), HAPO demonstrates enhanced optimization efficiency, outperforming comparable automated prompt optimization methods and establishing an extensible paradigm for scalable prompt engineering.

preprint2022arXiv

Entanglement-Enhanced Quantum Metrology in Colored Noise by Quantum Zeno Effect

In open quantum systems, the precision of metrology inevitably suffers from the noise. {In Markovian open quantum dynamics, the precision can not be improved by using entangled probes although the measurement time is effectively shortened.} However, it was predicted over one decade ago that in a non-Markovian one, the error can be significantly reduced by the quantum Zeno effect (QZE) [Chin, Huelga, and Plenio, Phys. Rev. Lett. \textbf{109}, 233601 (2012)]. In this work, we apply a recently-developed quantum simulation approach to experimentally verify that entangled probes can improve the precision of metrology by the QZE. Up to $n=7$ qubits, we demonstrate that the precision has been improved by a factor of $n^{1/4}$, which is consistent with the theoretical prediction. Our quantum simulation approach may provide an intriguing platform for experimental verification of various quantum metrology schemes.

preprint2022arXiv

Experimental quantum simulation of non-Hermitian dynamical topological states using stochastic Schrödinger equation

Noise is ubiquitous in real quantum systems, leading to non-Hermitian quantum dynamics, and may affect the fundamental states of matter. Here we report in experiment a quantum simulation of the two-dimensional non-Hermitian quantum anomalous Hall (QAH) model using the nuclear magnetic resonance processor. Unlike the usual experiments using auxiliary qubits, we develop a stochastic average approach based on the stochastic Schrödinger equation to realize the non-Hermitian dissipative quantum dynamics, which has advantages in saving the quantum simulation sources and simplifies implementation of quantum gates. We demonstrate the stability of dynamical topology against weak noise, and observe two types of dynamical topological transitions driven by strong noise. Moreover, a region that the emergent topology is always robust regardless of the noise strength is observed. Our work shows a feasible quantum simulation approach for dissipative quantum dynamics with stochastic Schrödinger equation and opens a route to investigate non-Hermitian dynamical topological physics.

preprint2022arXiv

Experimental Realization of a Quantum Refrigerator Driven by Indefinite Causal Orders

Indefinite causal order (ICO) is playing a key role in recent quantum technologies. Here, we experimentally study quantum thermodynamics driven by ICO on nuclear spins using the nuclear magnetic resonance system. We realize the ICO of two thermalizing channels to exhibit how the mechanism works, and show that the working substance can be cooled or heated albeit it undergoes thermal contacts with reservoirs of the same temperature. Moreover, we construct a single cycle of the ICO refrigerator based on the Maxwell's demon mechanism, and evaluate its performance by measuring the work consumption and the heat energy extracted from the low-temperature reservoir. Unlike classical refrigerators in which the coefficient of performance (COP) is perversely higher the closer the temperature of the high-temperature and low-temperature reservoirs are to each other, the ICO refrigerator's COP is always bounded to small values due to the non-unit success probability in projecting the ancillary qubit to the preferable subspace. To enhance the COP, we propose and experimentally demonstrate a general framework based on the density matrix exponentiation (DME) approach, as an extension to the ICO refrigeration. The COP is observed to be enhanced by more than three times with the DME approach. Our work demonstrates a new way for non-classical heat exchange, and paves the way towards construction of quantum refrigerators on a quantum system.

preprint2020arXiv

Geodesic Distance Field-based Curved Layer Volume Decomposition for Multi-Axis Support-free Printing

This paper presents a new curved layer volume decomposition method for multi-axis support-free printing of freeform solid parts. Given a solid model to be printed that is represented as a tetrahedral mesh, we first establish a geodesic distance field embedded on the mesh, whose value at any vertex is the geodesic distance to the base of the model. Next, the model is naturally decomposed into curved layers by interpolating a number of iso-geodesic distance surfaces (IGDSs). These IGDSs morph from bottom-up in an intrinsic and smooth way owing to the nature of geodesics, which will be used as the curved printing layers that are friendly to multi-axis printing. In addition, to cater to the collision-free requirement and to improve the printing efficiency, we also propose a printing sequence optimization algorithm for determining the printing order of the IGDSs, which helps reduce the air-move path length. Ample experiments in both computer simulation and physical printing are performed, and the experimental results confirm the advantages of our method.

preprint2020arXiv

Multi-Axis Support-Free Printing of Freeform Parts with Lattice Infill Structures

In additive manufacturing, infill structures are commonly used to reduce the weight and cost of a solid part. Currently, most infill structure generation methods are based on the conventional 2.5-axis printing configuration, which, although able to satisfy the self-supporting condition on the infills, suffer from the well-known stair-case effect on the finished surface and the need of extensive support for overhang features. In this paper, based on the emerging continuous multi-axis printing configuration, we present a new lattice infill structure generation algorithm, which is able to achieve both the self-supporting condition for the infills and the support-free requirement at the boundary surface of the part. The algorithm critically relies on the use of three mutually orthogonal geodesic distance fields that are embedded in the tetrahedral mesh of the solid model. The intersection between the iso-geodesic distance surfaces of these three geodesic distance fields naturally forms the desired lattice of infill structure, while the density of the infills can be conveniently controlled by adjusting the iso-values. The lattice infill pattern in each curved slicing layer is trimmed to conform to an Eulerian graph so to generate a continuous printing path, which can effectively reduce the nozzle retractions during the printing process. In addition, to cater to the collision-free requirement and to improve the printing efficiency, we also propose a printing sequence optimization algorithm for determining a collision-free order of printing of the connected lattice infills, which seeks to reduce the air-move length of the nozzle. Ample experiments in both computer simulation and physical printing are performed, and the results give a preliminary confirmation of the advantages of our methodology.