Source author record

Xiaoyuan Li

Xiaoyuan Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-ph Computation and Language Human-Computer Interaction Quantitative Methods

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On Predicting the Post-training Potential of Pre-trained LLMs

The performance of Large Language Models (LLMs) on downstream tasks is fundamentally constrained by the capabilities acquired during pre-training. However, traditional benchmarks like MMLU often fail to reflect a base model's plasticity in complex open-ended scenarios, leading to inefficient model selection. We address this by introducing a new task of predicting post-training potential - forecasting a base model's performance before post-training. We propose RuDE (Rubric-based Discriminative Evaluation), a unified framework that bypasses the generation gap of base models by leveraging response discrimination. Guided by our systematic 4C Taxonomy, RuDE constructs controlled contrastive pairs across diverse domains by fine-grained rubric violations. Extensive experiments demonstrate a correlation greater than 90% with post-training performance. Crucially, validation via Reinforcement Learning (RL) confirms that RuDE effectively identifies high-potential smaller models that outperform larger counterparts, offering a compute-efficient mechanism for foundation model development.

preprint2026arXiv

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

Large Language Models (LLMs) achieve strong performance on standard knowledge evaluation benchmarks, yet recent work shows that their knowledge capabilities remain brittle under question variants that test the same knowledge in different forms. Robustness augmentation of existing knowledge evaluation benchmarks is therefore necessary, but current LLM-assisted generate-then-verify pipelines are costly and difficult to scale due to low-yield variant generation and unreliable variant verification. We propose SAGE (Scalable Automated Generation of Robustness BEnchmarks), a framework for scalable robustness augmentation of knowledge evaluation benchmarks using fine-tuned smaller models. SAGE consists of VariantQual, a rubric-based verifier trained on human-labeled seed data, and VariantGen, a variant generator initialized with supervised fine-tuning and further optimized with reinforcement learning using VariantQual as the reward model. Experiments on HellaSwag show that SAGE constructs a large-scale robustness-augmented benchmark with quality comparable to the human-annotated HellaSwag-Pro at substantially lower cost, while the fine-tuned models further generalize to MMLU without benchmark-specific fine-tuning.

preprint2026arXiv

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

Skill libraries enable large language model agents to reuse experience from past interactions, but most existing libraries store skills as isolated entries and retrieve them only by semantic similarity. This leads to two key challenges for compositional tasks. Firstly, an agent must identify not only relevant skills but also how they depend on and build upon each other. Secondly, it also makes library maintenance difficult, since the system lacks structural cues for deciding when skills should be merged, split, or removed. We propose SKILLGRAPH, a framework that represents reusable skills as nodes in a directed graph, with typed edges encoding prerequisite, enhancement, and co-occurrence relations. Given a new task, SKILLGRAPH retrieves not just individual skills, but an ordered skill subgraph that can guide multi-step decision making. The graph is continuously updated from agent trajectories and reinforcement learning feedback, allowing both the skill library and the agent policy to improve together. Experiments on ALFWorld, WebShop, and seven search-augmented QA tasks show that SKILLGRAPH achieves state-of-the-art performance against memory-augmented RL methods, with especially large gains on complex tasks that require composing multiple skills.

preprint2022arXiv

Mechanism, measurement, and quantification of stress in decision process: a model based systematic-review protocol

Every human action begins with decision-making. Stress is a significant source of biases that can influence human decision-making. In order to understand the relationship between stress and decision-making, stress quantification is fundamental. Different methods of measuring and quantifying stress in decision-making have been described in the literature while an up-to-date systematic review of the existing methods is lacking. Moreover, mental stress, mental effort, cognitive workload, and workload are often used interchangeably but should be distinguished to enable in-depth investigations of decision-mechanisms. Our objectives are to clarify stress related concepts and review the measurement, quantification, and application of stress during decision making activities.

preprint2001arXiv

Non-unitarity of CKM matrix from the vector singlet quark mixing and neutron electric dipole moment

In the standard model (SM) the lowest order contribution to the quark electric dipole moment (EDM) occurs at the three loop level. We show that the non-unitarity of the CKM matrix in models with an extended quark sector typically gives rise to a quark EDM at the two loop level which has no GIM-like suppression factors except the external quark mass. The induced neutron EDM is of order 10^{-29} e cm and can be well within the reach of the next generation of experiments if it is further enhanced by long distance physics as happens in the SM.

preprint1999arXiv

Enhanced contribution to quark and neutron electric dipole moments with small mixing of right-handed currents and CKM CP violation

We study the light quark and the neutron electric dipole moments (EDMs) under the assumptions that the CP source is still in the usual CKM matrix and that there is a small mixing of right-handed charged currents in the quark sector. We find that the EDMs arise already at two loop order that are much larger than the standard model (SM) result even for a small mixing.

preprint1999arXiv

Vanishing Contribution to Quark Electric Dipole Moment in the 2HD Model with CKM CP Violation

In the standard model (SM) of electroweak interactions, CP noninvariance arises from the nonzero phase in the CKM matrix. Its contribution to the quark electric dipole moment (EDM) vanishes surprisingly at two loop order. This makes the quark EDM extremely small in the SM. In this paper, we consider the two Higgs doublet extension of the SM and assume that CP noninvariance is still encoded in the CKM matrix. We calculate the charged Higgs boson contribution to the quark EDM which naively should be of order $eG_F^2\tildeδ(4π)^{-4}m_{u(d)}m_t^2m_b^2m_H^{-2}$ for the up (down) quark with possible enhancement factors of $\tan^2β$. Here $\tildeδ$ is the rephasing invariant of CP violation. However, contrary to the above naive expectation, we find that the charged Higgs boson contribution vanishes strictly at two loop order. We show explicitly how this comes about and explains how it is related to the general form of Yukawa couplings in a spontaneously broken gauge theory.

preprint1996arXiv

$O(α^2 G_F m^2_t)$ Contributions to $H\toγγ$

The rare decay $H\to γγ$ is a promising detection channel for an intermediate mass Higgs boson. We compute its two-loop $O(α^2 G_F m_t^2)$ correction in the standard model and find that the relative correction to the decay rate runs between $0.7%$ and $0.5%$ for $M_H=80-150$ GeV. The analogous correction to the amplitude for $gg\to H$ is recovered as a special case. The generalization of our result to other models is also briefly indicated.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint