Source author record

Chendi Qu

Chendi Qu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Robotics Systems and Control

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models

Vision-language-action (VLA) models achieve strong generalization through large-scale pre-training, but real-world deployment requires expert-level task proficiency in addition to broad generality. Existing post-training approaches for VLA models are typically offline, single-robot, or task-specific, limiting effective on-policy adaptation and scalable learning from real-world interaction. We introduce a Scalable Online Post-training (SOP) system that enables online, distributed, multi-task post-training of generalist VLA models directly in the physical world. SOP tightly couples execution and learning through a closed-loop architecture in which a fleet of robots continuously streams on-policy experience and human intervention signals to a centralized cloud learner, and asynchronously receives updated policies. This design supports prompt on-policy correction, scales experience collection through parallel deployment, and preserves generality during adaptation. SOP is agnostic to the choice of post-training algorithm; we instantiate it with both interactive imitation learning (HG-DAgger) and reinforcement learning (RECAP). Across a range of real-world manipulation tasks including cloth folding, box assembly, and grocery restocking, we show that SOP substantially improves the performance of large pretrained VLA models while maintaining a single shared policy across tasks. Effective post-training can be achieved within hours of real-world interaction, and performance scales near-linearly with the number of robots in the fleet. These results suggest that tightly coupling online learning with fleet-scale deployment is instrumental to enabling efficient, reliable, and scalable post-training of generalist robot policies in the physical world.

preprint2022arXiv

Moving Target Interception Considering Dynamic Environment

The interception of moving targets is a widely studied issue. In this paper, we propose an algorithm of intercepting the moving target with a wheeled mobile robot in a dynamic environment. We first predict the future position of the target through polynomial fitting. The algorithm then generates an interception trajectory with path and speed decoupling. We use Hybrid A* search to plan a path and optimize it via gradient decent method. To avoid the dynamic obstacles in the environment, we introduce ST graph for speed planning. The speed curve is represented by piecewise Bézier curves for further optimization. Compared with other interception algorithms, we consider a dynamic environment and plan a safety trajectory which satisfies the kinematic characteristics of the wheeled robot while ensuring the accuracy of interception. Simulation illustrates that the algorithm successfully achieves the interception tasks and has high computational efficiency.

preprint2022arXiv

Multi-period Optimal Control for Mobile Agents Considering State Unpredictability

The optimal control for mobile agents is an important and challenging issue. Recent work shows that using randomized mechanism in agents' control can make the state unpredictable, and thus improve the security of agents. However, the unpredictable design is only considered in single period, which can lead to intolerable control performance in long time horizon. This paper aims at the trade-off between the control performance and state unpredictability of mobile agents in long time horizon. Utilizing random perturbations consistent with uniform distributions to maximize the attackers' prediction errors of future states, we formulate the problem as a multi-period convex stochastic optimization problem and solve it through dynamic programming. Specifically, we design the optimal control strategy considering both unconstrained and input constrained systems. The analytical iterative expressions of the control are further provided. Simulation illustrates that the algorithm increases the prediction errors under Kalman filter while achieving the control performance requirements successfully.

Chendi Qu

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

SOP: A Scalable Online Post-Training System for Vision-Language-Action Models

Moving Target Interception Considering Dynamic Environment

Multi-period Optimal Control for Mobile Agents Considering State Unpredictability