Researcher profile

Zongkai Liu

Zongkai Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

GAGPO: Generalized Advantage Grouped Policy Optimization

Reinforcement learning has become a powerful paradigm for post-training large language model agents, yet credit assignment in multi-turn environments remains a challenge. Agents often receive sparse, trajectory-level rewards only at the end of an episode, making it difficult to determine which intermediate actions contributed to success or failure. As a result, propagating delayed outcomes back to individual decision steps without relying on costly auxiliary value models remains an open problem. We propose Generalized Advantage Grouped Policy Optimization (GAGPO), a critic-free reinforcement learning method for precise, step-aligned temporal credit assignment. GAGPO constructs a non-parametric grouped value proxy from sampled rollouts and uses it to compute TD/GAE-style temporal advantages, recursively propagating outcome supervision backward through time. Combined with group-wise advantage normalization and an action-level importance ratio, GAGPO extracts stable, localized optimization signals directly from multi-turn trajectories. Experiments on ALFWorld and WebShop show that GAGPO outperforms strong reinforcement learning baselines. Further analyses demonstrate faster early-stage learning, improved interaction efficiency, and smoother optimization dynamics, suggesting that GAGPO offers a simple yet effective framework for multi-turn agentic reinforcement learning.

preprint2026arXiv

Self-Supervised Learning with Noisy Dataset for Rydberg Microwave Sensors Denoising

We report a self-supervised deep learning framework for Rydberg sensors that enables single-shot noise suppression matching the accuracy of multi-measurement averaging. The framework eliminates the need for clean reference signals (hardly required in quantum sensing) by training on two sets of noisy signals with identical statistical distributions. When evaluated on Rydberg sensing datasets, the framework outperforms wavelet transform and Kalman filtering, achieving a denoising effect equivalent to 10,000-set averaging while reducing computation time by three orders of magnitude. We further validate performance across diverse noise profiles and quantify the complexity-performance trade-off of U-Net and Transformer architectures, providing actionable guidance for optimizing deep learning-based denoising in Rydberg sensor systems.

preprint2024arXiv

Policy-regularized Offline Multi-objective Reinforcement Learning

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.

preprint2020arXiv

Bi2Te3/Si thermophotovoltaic cells converting low temperature radiation into electricity

The thermophotovoltaic cells which convert the low temperature radiation into electricity are of significance due to their potential applications in many fields. In this work, Bi2Te3/Si thermophotovoltaic cells which work under the radiation from the blackbody with the temperature of 300 K-480 K are presented. The experimental results show that the cells can output electricity even under the radiation temperature of 300 K. The band structure of Bi2Te3/Si heterojunctions and the defects in Bi2Te3 thin films lower the conversion efficiency of the cells. It is also demonstrated that the resistivity of Si and the thickness of Bi2Te3 thin films have important effects on Bi2Te3/Si thermophotovoltaic cells. Although the cells' output power is small, this work provides a possible way to utilize the low temperature radiation.