Researcher profile

Xiang Guo

Xiang Guo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an important paradigm for unlocking reasoning capabilities in large language models, exemplified by the success of OpenAI o1 and DeepSeek-R1. Currently, Group Relative Policy Optimization (GRPO) stands as the dominant algorithm in this domain due to its stable training and critic-free efficiency. However, we argue that GRPO suffers from a structural limitation: it imposes a uniform, static trust region constraint across all samples. This design implicitly assumes signal homogeneity, a premise misaligned with the heterogeneous nature of outcome-driven learning, where advantage magnitudes and variances fluctuate significantly. Consequently, static constraints fail to fully exploit high-quality signals while insufficiently suppressing noise, often precipitating rapid entropy collapse. To address this, we propose \textbf{E}lastic \textbf{T}rust \textbf{R}egions (\textbf{ETR}), a dynamic mechanism that aligns optimization constraints with signal quality. ETR constructs a signal-aware landscape through dual-level elasticity: at the micro level, it scales clipping boundaries based on advantage magnitude to accelerate learning from high-confidence paths; at the macro level, it leverages group variance to implicitly allocate larger update budgets to tasks in the optimal learning zone. Extensive experiments on AIME and MATH benchmarks demonstrate that ETR consistently outperforms GRPO, achieving superior accuracy while effectively mitigating policy entropy degradation to ensure sustained exploration.

preprint2021arXiv

ORCLSim: A System Architecture for Studying Bicyclist and Pedestrian Physiological Behavior Through Immersive Virtual Environments

Injuries and fatalities for vulnerable road users, especially bicyclists and pedestrians, are on the rise. To better inform design for vulnerable road users, we need to conduct more studies to evaluate how bicyclist and pedestrian behavior and physiological states change in different roadway designs and contextual settings. Previous research highlights the advantages of Immersive Virtual Environment (IVE) in conducting bicyclist and pedestrian studies. These environments do not put participants at risk of getting injured, are low-cost compared to on-road or naturalistic studies and allow researchers to fully control variables of interest. In this paper, we propose a framework ORCLSim, to support human sensing techniques within IVE to evaluate bicyclist and pedestrian physiological and behavioral changes in different contextual settings. To showcase this framework, we present two case studies where we collect and analyze pilot data from five participants' physiological and behavioral responses in an IVE setting, representing real-world roadway segments and traffic conditions. Results from these case studies indicate that physiological data is sensitive to road environment changes and real-time events, especially changes in heart rate and gaze behavior. Additionally, our preliminary data indicates participants may respond differently to various roadway settings (e.g., intersections with or without traffic signal). By analyzing these changes, we can identify how participants' stress levels and cognitive load is impacted by the simulated surrounding environment. The ORCLSim system architecture can be further utilized for future studies in users' behavioral and physiological responses in different virtual reality settings.