Researcher profile

Gang Li

Gang Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Charge disproportionation as a possible mechanism towards polar antiferromagnetic metal in molecular orbital crystal

Polar antiferromagnetic metals have recently garnered increasing interests due to their combined traits of both ferromagnets and antiferromagnets for spintronic applications. However, the inherently incompatible nature of antiferromagnet, metallicity and polarity pose a significant challenge. We propose that charge disproportionation can lead to this novel state in negative charge transfer gap regime in molecular orbital crystal by molecular orbital analyses of first-principles DFT+$U$ electronic band structure for representative Ruddlesden-Popper bilayer perovskite oxides Sr$_3$Co$_2$O$_7$, corroborated by Density Matrix Renormalization Group calculation. Due to the negative charge transfer nature of Co$^{4+}$ and imposed by strong interlayer coupling, localized molecular orbitals stemming from the hybridization of Co $d_{z^2}$ and $d_{xz/yz}$ orbitals through the apical oxygen $p$ orbitals are preferably emergent within each bilayer unit, which develop antiferromagnetic ordering by invoking Hubbard repulsion. Charge disproportionation driven by Hund's physics, makes an occupation imbalance with broken inversion symmetry in the remaining $d_{xy}$ and $d_{x^2-y^2}$ orbitals from distinct Co atoms within the bilayer unit, resulting in the polar metallicity. Meanwhile, this charge disproportionation scenario allows consequent conducting carriers to couple with interlayer local spins via Hund's coupling, giving rise to in-plane double-exchange ferromagnetism. Our molecular orbital formulation further provides a guide towards an effective Hamiltonian for modelling the unconventional synergy of metallicity, polarity and antiferromagnetism in Sr$_3$Co$_2$O$_7$, which may be a unified framework widely applicable to double-layer Ruddlesden-Popper perovskite oxides.

preprint2026arXiv

Dipion transitions from $X(3872)$ to $χ_{cJ}\ (J=0,1,2)$

In this work, we investigate the dipion transition processes $X(3872)\to ππχ_{cJ} (J=0,1,2)$ within the framework of heavy hadron chiral perturbation theory, treating $X(3872)$ as a molecular state composed of $D\bar{D}^*$+ H.c. components. By analyzing the box and triangle loop diagrams with the nonrelativistic effective field theory power-counting rule, we demonstrate that box diagrams dominate these dipion transition processes. Branching ratios are calculated as functions of the mixing angle $θ$, which parametrizes the neutral and charged meson compositions of the $X(3872)$. Our results indicate that the branching fractions for $X(3872)\toππχ_{c0}$, $X(3872)\to ππχ_{c1}$, and $X(3872)\to ππχ_{c2}$ are of the orders of $10^{-4}$, $10^{-3}$, and $10^{-5}$, respectively. We also predict the ratios ${\mathcal{B}[X(3872)\rightarrow ππχ_{c0/2}]}/{\mathcal{B}[X(3872)\rightarrow ππχ_{c1}]}$ and ${\mathcal{B}[X(3872)\rightarrow π^+π^-χ_{cJ}]}/{\mathcal{B}[X(3872)\rightarrow π^0π^0χ_{cJ}]}$. The latter deviates from isospin-symmetry expectations, revealing various degrees of isospin violation. By studying the $π^+π^-$ and $π^+χ_{cJ}$ invariant mass spectra, we find a double-bump structure in the $π^ + π^-$ invariant mass distributions of the process $X(3872)\to π^+π^-χ_{c1}$ and $π^+χ_{c0}$ invariant mass distribution of the process $X(3872)\to π^+π^-χ_{c0}$, which could be tested by future experimental measurements.

preprint2026arXiv

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

The recent success and openness of DeepSeek-R1 have brought widespread attention to Group Relative Policy Optimization (GRPO) as a reinforcement learning method for large reasoning models (LRMs). In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias. We also identify a connection between GRPO and traditional discriminative methods in supervised learning. Motivated by these insights, we introduce a new Discriminative Constrained Optimization (DisCO) framework for reinforcing LRMs, grounded in the principle of discriminative learning. The main differences between DisCO and GRPO and its recent variants are: (1) it replaces the group relative objective with a discriminative objective defined by a scoring function; (2) it abandons clipping-based surrogates in favor of non-clipping RL surrogate objectives used as scoring functions; (3) it employs a simple yet effective constrained optimization approach to enforce the KL divergence constraint. As a result, DisCO offers notable advantages over GRPO and its variants: (i) it completely eliminates difficulty bias by adopting discriminative objectives; (ii) it addresses the entropy instability in GRPO and its variants through the use of non-clipping scoring functions and a constrained optimization approach, yielding long and stable training dynamics; (iii) it allows the incorporation of advanced discriminative learning techniques to address data imbalance, where a significant number of questions have more negative than positive generated answers during training. Our experiments on enhancing the mathematical reasoning capabilities of SFT-finetuned models show that DisCO significantly outperforms GRPO and its improved variants such as DAPO, achieving average gains of 7\% over GRPO and 6\% over DAPO across six benchmark tasks for a 1.5B model.

preprint2026arXiv

GLAP: General contrastive audio-text pretraining across domains and languages

Contrastive Language Audio Pretraining (CLAP) is a widely-used method to bridge the gap between audio and text domains. Current CLAP methods enable sound and music retrieval in English, ignoring multilingual spoken content. To address this, we introduce general language audio pretraining (GLAP), which expands CLAP with multilingual and multi-domain abilities. GLAP demonstrates its versatility by achieving competitive performance on standard audio-text retrieval benchmarks like Clotho and AudioCaps, while significantly surpassing existing methods in speech retrieval and classification tasks. Additionally, GLAP achieves strong results on widely used sound-event zero-shot benchmarks, while simultaneously outperforming previous methods on speech content benchmarks. Further keyword spotting evaluations across 50 languages emphasize GLAP's advanced multilingual capabilities. Finally, multilingual sound and music understanding is evaluated across four languages. Checkpoints and Source: https://github.com/xiaomi-research/dasheng-glap.

preprint2026arXiv

Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs) in domains such as mathematics and coding. However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. Furthermore, current RLVR focuses solely on the correctness of final answers, leading to the limitations of flawed reasoning and sparse reward signals. In this work, we propose Knowledge-to-Verification (K2V), a framework that extends RLVR to knowledge-intensive domains through automated verifiable data synthesis, while enabling verification of the LLM's reasoning process. Extensive experiments demonstrate that K2V enhances the reasoning of LLM in knowledge-intensive domains without significantly compromising the model's general capabilities. This study also suggests that integrating automated data synthesis with reasoning verification is a promising direction to enhance model capabilities in these broader domains. Code is available at https://github.com/SeedScientist/K2V.

preprint2026arXiv

Multiple nodal superconducting phases and order-parameter evolution in pressurized UTe$_2$

Spin-triplet superconductivity (SC) offers a unique avenue for realizing non-Abelian Majorana zero modes and thus the fault-tolerant topological quantum computation, and has attracted a broad audience for both fundamental research and potential applications. The recently discovered heavy-fermion spin-triplet superconductor candidate UTe$_2$ has sparked great interest for its ultrahigh upper critical field and reentrant SC phases in the proximity to a field-polarized magnetic state. Despite extensive studies on the phase diagrams and competing orders induced by pressure and magnetic field, limited has been known about its SC order parameters and their evolution with these control parameters, largely due to the lack of appropriate symmetry-sensitive detections. Here, we report comprehensive point-contact spectroscopy measurements of pressurized UTe$_2$ on the (0~0~1) surface. The observation of Andreev bound state strongly suggests the presence of a $p_z$ component in the SC order parameters. Quantitative analysis based on an extended Blonder-Tinkham-Klapwijk model unveils $B_{2u}$ or $B_{3u}$ as the most likely representation for both ambient and pressurized UTe$_2$, and remarkably, the multiple SC phases can be distinguished by a single parameter $\langle Δ_{z}\rangle/\langleΔ_{x(y)}\rangle$, the relative weight between the $p_z$-wave and $p_{x(y)}$-wave pairings. These findings not only impose stringent constraints on the superconducting order parameter in UTe$_2$, but also provide key spectroscopic evidence for the existence of multiple SC phases tuned through pressure.

preprint2026arXiv

Parameter Convergence Radar Detector Based on VAMP Deep Unfolding

Compared with the sparse recovery process in traditional compressed sensing (CS) radar detector CAMP, vector AMP deep unfolding (VAMP-DU) can achieve sparse recovery over a broader range of observation matrices, with faster convergence speed and higher recovery accuracy. However, the distribution of the error term in VAMP-DU remains unknown, which renders the distribution of the test statistic in CS radar detection undetermined and thus hinders threshold setting under a given false alarm rate when VAMP-DU is applied to CS radar detection. In this work, we theoretically prove that the error term in VAMP-DU follows a Gaussian distribution by leveraging a general state evolution (SE). Based on the Gaussianity, we propose a new parameter convergence radar detector (PCRD) as the CS detector to calculate the distribution parameter of the test statistic and realize target detection under a given false alarm rate. Specifically, PCRD exploits the Gaussian property of error term in VAMP-DU to exhibit superior false alarm control capability, while leveraging the improved recovery accuracy of VAMP-DU to further enhance target detection performance. Numerical simulations validate the Gaussianity of the error term in VAMP-DU and show the superiority of the VAMP-DU-based PCRD over existing approaches in both false alarm control accuracy and target detection performance.

preprint2026arXiv

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B. Code is available at: https://github.com/TencentYoutuResearch/SmartSnap

preprint2026arXiv

Uncertainty Analysis of Experimental Parameters for Reducing Warpage in Injection Molding

Injection molding is a critical manufacturing process, but controlling warpage remains a major challenge due to complex thermomechanical interactions. Simulation-based optimization is widely used to address this, yet traditional methods often overlook the uncertainty in model parameters. In this paper, we propose a data-driven framework to minimize warpage and quantify the uncertainty of optimal process settings. We employ polynomial regression models as surrogates for the injection molding simulations of a box-shaped part. By adopting a Bayesian framework, we estimate the posterior distribution of the regression coefficients. This approach allows us to generate a distribution of optimal decisions rather than a single point estimate, providing a measure of solution robustness. Furthermore, we develop a Monte Carlo-based boundary analysis method. This method constructs confidence bands for the zero-level sets of the response surfaces, helping to visualize the regions where warpage transitions between convex and concave profiles. We apply this framework to optimize four key process parameters: mold temperature, injection speed, packing pressure, and packing time. The results show that our approach finds stable process settings and clearly marks the boundaries of defects in the parameter space.

preprint2025arXiv

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive manual effort in tool integration and prompt engineering, while deployed agents struggle to adapt to dynamic environments without expensive fine-tuning. To address these issues, we propose \textbf{Youtu-Agent}, a modular framework designed for the automated generation and continuous evolution of LLM agents. Youtu-Agent features a structured configuration system that decouples execution environments, toolkits, and context management, enabling flexible reuse and automated synthesis. We introduce two generation paradigms: a \textbf{Workflow} mode for standard tasks and a \textbf{Meta-Agent} mode for complex, non-standard requirements, capable of automatically generating tool code, prompts, and configurations. Furthermore, Youtu-Agent establishes a hybrid policy optimization system: (1) an \textbf{Agent Practice} module that enables agents to accumulate experience and improve performance through in-context optimization without parameter updates; and (2) an \textbf{Agent RL} module that integrates with distributed training frameworks to enable scalable and stable reinforcement learning of any Youtu-Agents in an end-to-end, large-scale manner. Experiments demonstrate that Youtu-Agent achieves state-of-the-art performance on WebWalkerQA (71.47\%) and GAIA (72.8\%) using open-weight models. Our automated generation pipeline achieves over 81\% tool synthesis success rate, while the Practice module improves performance on AIME 2024/2025 by +2.7\% and +5.4\% respectively. Moreover, our Agent RL training achieves 40\% speedup with steady performance improvement on 7B LLMs, enhancing coding/reasoning and searching capabilities respectively up to 35\% and 21\% on Maths and general/multi-hop QA benchmarks.