Source author record

Yucong Huang

Yucong Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Science and Game Theory Machine Learning math-ph math.AP math.MP Multiagent Systems physics.flu-dyn

Catalog footprint

What is connected

3works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

For solving zero-sum games involving non-transitivity, a useful approach is to maintain a policy population to approximate the Nash Equilibrium (NE). Previous studies have shown that the Policy Space Response Oracles (PSRO) algorithm is an effective framework for solving such games. However, current methods initialize a new policy from scratch or inherit a single historical policy in Best Response (BR), missing the opportunity to leverage past policies to generate a better BR. In this paper, we propose Fusion-PSRO, which employs Nash Policy Fusion to initialize a new policy for BR training. Nash Policy Fusion serves as an implicit guiding policy that starts exploration on the current Meta-NE, thus providing a closer approximation to BR. Moreover, it insightfully captures a weighted moving average of past policies, dynamically adjusting these weights based on the Meta-NE in each iteration. This cumulative process further enhances the policy population. Empirical results on classic benchmarks show that Fusion-PSRO achieves lower exploitability, thereby mitigating the shortcomings of previous research on policy initialization in BR.

preprint2026arXiv

Transitivity Meets Cyclicity: Explicit Preference Decomposition for Dynamic Large Language Model Alignment

Standard RLHF relies on transitive scalar rewards, failing to capture the cyclic nature of human preferences. While some approaches like the General Preference Model (GPM) address this, we identify a theoretical limitation: their implicit formulation entangles hierarchy with cyclicity, failing to guarantee dominant solutions. To address this, we propose the Hybrid Reward-Cyclic (HRC) model, which utilizes game-theoretic decomposition to explicitly disentangle preferences into orthogonal transitive (scalar) and cyclic (vector) components. Complementing this, we introduce Dynamic Self-Play Preference Optimization (DSPPO), which treats alignment as a time-varying game to progressively guide the policy toward the Nash equilibrium. Synthetic data experiments further validate HRC's structural superiority in mixed transitive--cyclic settings, where HRC converges faster and achieves higher accuracy than GPM. Experiments on RewardBench 2 demonstrate that HRC consistently improves over both BT and GPM baselines (e.g., +1.23% on Gemma-2B-it). In particular, its superior performance in the Ties domain empirically validates the model's robustness in handling complex, non-strict preferences. Extensive downstream evaluations on AlpacaEval 2.0, Arena-Hard-v0.1, and MT-Bench confirm the efficacy of our framework. Notably, when using Gemma-2B-it as the base preference model, HRC+DSPPO achieves a peak length-controlled win-rate of 44.75% on AlpacaEval 2.0 and 46.8% on Arena-Hard-v0.1, significantly outperforming SPPO baselines trained with BT or GPM. Our code is publicly available at https://github.com/lab-klc/Hybrid-Reward-Cyclic.

preprint2022arXiv

Global Spherically Symmetric Solutions of the Multidimensional Full Compressible Navier-Stokes Equations with Large Data

We establish the global-in-time existence of solutions of the Cauchy problem for the full Navier-Stokes equations for compressible heat-conducting flow in multidimensions with initial data that are large, discontinuous, spherically symmetric, and away from the vacuum. The solutions obtained here are of global finite total relative-energy including the origin, while cavitation may occur as balls centred at the origin of symmetry for which the interfaces between the fluid and the vacuum must be upper semi-continuous in space-time in the Eulerian coordinates. On any region strictly away from the possible vacuum, the velocity and specific internal energy are Hölder continuous, and the density has a uniform upper bound. To achieve these, our main strategy is to regard the Cauchy problem as the limit of a series of carefully designed initial-boundary value problems that are formulated in finite annular regions. For such approximation problems, we can derive uniform {\it a-priori} estimates that are independent of both the inner and outer radii of the annuli considered in the spherically symmetric Lagrangian coordinates. The entropy inequality is recovered after taking the limit of the outer radius to infinity by using Mazur's lemma and the convexity of the entropy function, which is required for the limit of the inner radius tending to zero. Then the global weak solutions of the original problem are attained via careful compactness arguments applied to the approximate solutions in the Eulerian coordinates.