Source author record

Siyuan Zhu

Siyuan Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision cond-mat.mes-hall cond-mat.mtrl-sci Machine Learning

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GAGPO: Generalized Advantage Grouped Policy Optimization

Reinforcement learning has become a powerful paradigm for post-training large language model agents, yet credit assignment in multi-turn environments remains a challenge. Agents often receive sparse, trajectory-level rewards only at the end of an episode, making it difficult to determine which intermediate actions contributed to success or failure. As a result, propagating delayed outcomes back to individual decision steps without relying on costly auxiliary value models remains an open problem. We propose Generalized Advantage Grouped Policy Optimization (GAGPO), a critic-free reinforcement learning method for precise, step-aligned temporal credit assignment. GAGPO constructs a non-parametric grouped value proxy from sampled rollouts and uses it to compute TD/GAE-style temporal advantages, recursively propagating outcome supervision backward through time. Combined with group-wise advantage normalization and an action-level importance ratio, GAGPO extracts stable, localized optimization signals directly from multi-turn trajectories. Experiments on ALFWorld and WebShop show that GAGPO outperforms strong reinforcement learning baselines. Further analyses demonstrate faster early-stage learning, improved interaction efficiency, and smoother optimization dynamics, suggesting that GAGPO offers a simple yet effective framework for multi-turn agentic reinforcement learning.

preprint2022arXiv

Pointillism: Accurate 3D bounding box estimation with multi-radars

Autonomous perception requires high-quality environment sensing in the form of 3D bounding boxes of dynamic objects. The primary sensors used in automotive systems are light-based cameras and LiDARs. However, they are known to fail in adverse weather conditions. Radars can potentially solve this problem as they are barely affected by adverse weather conditions. However, specular reflections of wireless signals cause poor performance of radar point clouds. We introduce Pointillism, a system that combines data from multiple spatially separated radars with an optimal separation to mitigate these problems. We introduce a novel concept of Cross Potential Point Clouds, which uses the spatial diversity induced by multiple radars and solves the problem of noise and sparsity in radar point clouds. Furthermore, we present the design of RP-net, a novel deep learning architecture, designed explicitly for radar's sparse data distribution, to enable accurate 3D bounding box estimation. The spatial techniques designed and proposed in this paper are fundamental to radars point cloud distribution and would benefit other radar sensing applications.

preprint2015arXiv

Ultrafast electron dynamics at the Dirac node of the topological insulator Sb$_2$Te$_3$

Topological insulators (TIs) are a new quantum state of matter. Their surfaces and interfaces act as a topological boundary to generate massless Dirac fermions with spin-helical textures. Investigation of fermion dynamics near the Dirac point is crucial for the future development of spintronic devices incorporating topological insulators. However, research so far has been unsatisfactory because of a substantial overlap with the bulk valence band and a lack of a completely unoccupied Dirac point (DP). Here, we explore the surface Dirac fermion dynamics in the TI Sb$_2$Te$_3$ by time- and angle-resolved photoemission spectroscopy (TrARPES). Sb$_2$Te$_3$ has a DP located completely above the Fermi energy ($E_F$) with an in-gap DP. The excited electrons in the upper Dirac cone stay longer than those below the Dirac point to form an inverted population. This was attributed to a reduced density of states (DOS) near the DP .