Source author record

Wenwen Zhao

Wenwen Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning physics.flu-dyn

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance

Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We systematically quantify these failures, revealing highly non-uniform token informativeness, widespread step-level polarity misalignment, and substantial training waste. To address these limitations, we propose Entropy-Progress Aligned GRPO (EP-GRPO), a framework that mines the model's intrinsic information flow for dense, self-supervised guidance. EP-GRPO integrates entropy-gated modulation to prioritize high entropy decision pivots, implicit process signals from policy divergence anchored to outcome advantages for directional token-level feedback without external reward models, and cumulative entropy mapping that enables progress-aligned advantage normalization, naturally maintaining gradient flow under zero reward variance. Extensive experiments on mathematical reasoning benchmarks demonstrate that EP-GRPO achieves superior accuracy and efficiency compared to GRPO and its variants. The code will be available.

preprint2016arXiv

A new coupled computational method in conjunction with three-dimensional finite volume schemes for nonlinear coupled constitutive relations

Non-equilibrium effects play a vital role in high-speed and rarefied gas flows and the accurate simulation of these flow regimes are far beyond the capability of near-local-equilibrium Navier-Stokes-Fourier equations. Eu proposed generalized hydrodynamic equations which are consistent with the laws of irreversible thermodynamics to solve this problem. Based on Eu's generalized hydrodynamics equations, a computational model, namely the nonlinear coupled constitutive relations(NCCR),was developed by R.S.Myong and applied successfully to one-dimensional shock wave structure and two-dimensional rarefied flows. In this paper, finite volume schemes, including LU-SGS time advance scheme, MUSCL interpolation and AUSMPW+ scheme, are fistly adopted to investigate NCCR model's validity and potential in three-dimensional complex hypersonic rarefied gas flows. Moreover, in order to solve the computational stability problems in 3D complex flows,a modified solution is developed for the NCCR model. Finally, the modified solution is tested for a slip complex flow over a 3D hollow cylinder-flare configuration. The numerical results show that the NCCR model by the modified solution yields good solutions in better agreement with the DSMC results and experimential data than NSF equations, and imply NCCR model's great potential capability in further application.