Researcher profile

Guanhua Huang

Guanhua Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training

Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a principled framework for stable policy updates. However, the practical application of PPO is hindered by unreliable advantage estimation in the sparse-reward RLVR regime. This issue arises because the sparse rewards in RLVR lead to inaccurate intermediate value predictions, which in turn introduce significant bias when aggregated at every token by Generalized Advantage Estimation (GAE). To address this, we introduce Segmental Advantage Estimation (SAE), which mitigates the bias that GAE can incur in RLVR. Our key insight is that aggregating $n$-step advantages at every token(as in GAE) is unnecessary and often introduces excessive bias, since individual tokens carry minimal information. Instead, SAE first partitions the generated sequence into coherent sub-segments using low-probability tokens as heuristic boundaries. It then selectively computes variance-reduced advantage estimates only from these information-rich segment transitions, effectively filtering out noise from intermediate tokens. Our experiments demonstrate that SAE achieves superior performance, with marked improvements in final scores, training stability, and sample efficiency. These gains are shown to be consistent across multiple model sizes, and a correlation analysis confirms that our proposed advantage estimator achieves a higher correlation with an approximate ground-truth advantage, justifying its superior performance.

preprint2020arXiv

Analytical expressions of variable specific yield for layered soils in shallow water table environments

This paper presents analytical expressions of variable specific yield for layered soils in shallow water table environments, with introducing two distinct concepts of point specific yield (Syp) and interval average specific yield (Syi). The Syp and Syi refer to the specific yield for the water table fluctuation approaching zero infinitely and that for an interval fluctuation of water table, respectively. On the basis of specific yield definition and van Genuchten model of soil water retention, the analytical and semi-analytical expressions were respectively proposed for Syp and Syi towards layered soils. The analytical expressions are evaluated and verified by experimental data and comparison with the previous expressions. Analyses indicate our expressions for Syp and Syi could effectively reflect the changes and nonlinear properties affected by soil hydraulic properties and soil layering under shallow water table conditions. The previously confused understanding of Syp and Syi are also distinguished. The practicality and applicability for the specific yield expressions are comprehensively analyzed for the potential applications related to the subsurface water modeling and management issues.