Source author record

Zhiyuan Zhai

Zhiyuan Zhai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Machine Learning

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL

Group-relative RL training (GRPO) samples a small group of parallel rollouts for every training prompt and uses their within-group reward spread to compute per-trajectory advantages. In agentic environments each rollout is a long multi-turn dialogue with one LLM call per step, so this multi-sample multiplier dominates the total training cost. When every rollout of a prompt ends with the same reward, the group has zero reward variance and contributes no gradient, so the extra rollouts add no information; such groups are common in practice (typically around 40% of all groups), so the wasted-compute fraction is substantial rather than marginal. Existing methods filter such groups at the prompt level, either after their rollouts are paid for or before any rollout begins, but both decide without using information that becomes available during the rollout itself. We instead ask whether the in-group divergence between the partial trajectories at an intermediate step can already predict that the group will be zero-variance: when the parallel rollouts have already converged on the same action prefix, the group is on track to produce a single reward, and we can stop early. We propose a one-parameter gate that stops a group when the mean pairwise prefix edit distance between its partial action sequences falls below a threshold. On a 60-iteration on-policy GRPO run on ALFWorld with Qwen2.5-7B, averaged over four random seeds, the gated arm finishes 10.7% faster in wall-clock (bootstrap 95% CI excludes 0) and shifts held-out success rate on 50 unseen tasks by +2.5 pp, with the held-out gain tracing to a measurable reduction in zero-advantage gradient-batch dilution. Code is available at https://github.com/zhiyuanZhai20/selective-rollout.

preprint2022arXiv

Energy-Efficient UAV-Mounted RIS Assisted Mobile Edge Computing

Unmanned aerial vehicle (UAV) and reconfigurable intelligent surface (RIS) have been recently applied in the field of mobile edge computing (MEC) to improve the data exchange environment by proactively changing the wireless channels through maneuverable location deployment and intelligent signals reflection, respectively. Nevertheless, they may suffer from inherent limitations in practical scenarios. UAV-mounted RIS (U-RIS), as a promising integrated approach, can combine the advantages of UAV and RIS to break the limit. Inspired by this, we consider a novel U-RIS assisted MEC system, where a U-RIS is deployed to assist the communication between the ground users and an MEC server. The joint UAV trajectory, RIS passive beamforming and MEC resource allocation design is developed to maximize the energy efficiency (EE) of the system. To tackle the intractable non-convex problem, we divide it into two subproblems and solve them iteratively based on successive convex approximation (SCA) and the Dinkelbach method. Finally we obtain a high-performance suboptimal solution. Simulation results show that the proposed algorithm significantly improves the energy efficiency of the MEC system.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint