Source author record

Guanghui Ren

Guanghui Ren appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.optics Robotics Artificial Intelligence Computer Vision eess.SP physics.app-ph

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance

While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, data collection via human teleoperation requires continuous operator attention, which is costly, hard to scale. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance, enabling robots' interactive learning in deployment. GCENT starts at an imperfect policy and improves over time. When the robot execution failures occur, GCENT allows robots to revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data in long-horizon and precise tasks. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.

preprint2026arXiv

Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training

General-purpose robotic systems operating in open-world environments must achieve both broad generalization and high-precision action execution, a combination that remains challenging for existing Vision-Language-Action (VLA) models. While large Vision-Language Models (VLMs) improve semantic generalization, insufficient embodied reasoning leads to brittle behavior, and conversely, strong reasoning alone is inadequate without precise control. To provide a decoupled and quantitative assessment of this bottleneck, we introduce Embodied Reasoning Intelligence Quotient (ERIQ), a large-scale embodied reasoning benchmark in robotic manipulation, comprising 6K+ question-answer pairs across four reasoning dimensions. By decoupling reasoning from execution, ERIQ enables systematic evaluation and reveals a strong positive correlation between embodied reasoning capability and end-to-end VLA generalization. To bridge the gap from reasoning to precise execution, we propose FACT, a flow-matching-based action tokenizer that converts continuous control into discrete sequences while preserving high-fidelity trajectory reconstruction. The resulting GenieReasoner jointly optimizes reasoning and action in a unified space, outperforming both continuous-action and prior discrete-action baselines in real-world tasks. Together, ERIQ and FACT provide a principled framework for diagnosing and overcoming the reasoning-precision trade-off, advancing robust, general-purpose robotic manipulation. Project page: https://geniereasoner.github.io/GenieReasoner/

preprint2025arXiv

Suspended Z-cut lithium niobate waveguides for stimulated Brillouin scattering

On-chip stimulated Brillouin scattering (SBS) has recently been demonstrated in thin-film lithium niobate (TFLN), an emerging material platform for integrated photonics offering large electro-optic and nonlinear properties. While previous work on SBS in TFLN have focused on surface SBS, in this contribution we experimentally demonstrate, for the first time, backward intra-modal SBS generation in suspended Z-cut TFLN waveguides. Our results show trapping of multiple acoustic modes in this structure, featuring a multi-peak Brillouin gain spectrum due to the excitation of higher-order acoustic modes. The findings expand the TFLN waveguide platform exploration for SBS interactions and provide a crucial step towards realizing optical processors for microwave signals or sensors integrated on TFLN.

preprint2022arXiv

Phase retrieval of programmable photonic integrated circuits based on an on-chip fractional-delay reference path

Programmable photonic integrated circuits (PICs), offering diverse signal processing functions within a single chip, are promising solutions for applications ranging from optical communications to artificial intelligence. While the scale and complexity of programmable PICs is increasing, the characterization, and thus calibration, of them becomes increasingly challenging. Here we demonstrate a phase retrieval method for programmable PICs using an on-chip fractional-delay reference path. The impulse response of the chip can be uniquely and precisely identified from only the insertion loss using a standard complex Fourier transform. We demonstrate our approach experimentally with a 4-tap finite-impulse-response chip. The results match well with expectations and verifies our approach as effective for individually determining the taps' weights without the need for additional ports and photodiodes.

preprint2021arXiv

Video Relation Detection with Trajectory-aware Multi-modal Features

Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction. We use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection and multi-modal feature representation to help the prediction of relation between objects. Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which surpasses other methods by a large margin.