Source author record

Wenbo Zhang

Wenbo Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Robotics Machine Learning Computation and Language Cryptography and Security eess.IV eess.SP eess.SY Human-Computer Interaction Multiagent Systems Systems and Control

Catalog footprint

What is connected

10works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

We present Chain-of-Action (CoA), a novel visuo-motor policy paradigm built upon Trajectory Autoregressive Modeling. Unlike conventional approaches that predict next step action(s) forward, CoA generates an entire trajectory by explicit backward reasoning with task-specific goals through an action-level Chain-of-Thought (CoT) process. This process is unified within a single autoregressive structure: (1) the first token corresponds to a stable keyframe action that encodes the task-specific goals; and (2) subsequent action tokens are generated autoregressively, conditioned on the initial keyframe and previously predicted actions. This backward action reasoning enforces a global-to-local structure, allowing each local action to be tightly constrained by the final goal. To further realize the action reasoning structure, CoA incorporates four complementary designs: continuous action token representation; dynamic stopping for variable-length trajectory generation; reverse temporal ensemble; and multi-token prediction to balance action chunk modeling with global structure. As a result, CoA gives strong spatial generalization capabilities while preserving the flexibility and simplicity of a visuo-motor policy. Empirically, we observe CoA achieves the state-of-the-art performance across 60 RLBench tasks and 8 real-world manipulation tasks.

preprint2026arXiv

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Routing (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly accounts for distribution shift via a KL-divergence uncertainty set, admits an efficient primal--dual algorithm, and enjoys theoretical guarantees including uniqueness of the optimal policy and linear convergence. Extensive experiments show that RACER achieves superior accuracy--cost trade-offs under distribution shift.

preprint2026arXiv

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Autonomous systems are increasingly deployed in open and dynamic environments -- from city streets to aerial and indoor spaces -- where perception models must remain reliable under sensor noise, environmental variation, and platform shifts. However, even state-of-the-art methods often degrade under unseen conditions, highlighting the need for robust and generalizable robot sensing. The RoboSense 2025 Challenge is designed to advance robustness and adaptability in robot perception across diverse sensing scenarios. It unifies five complementary research tracks spanning language-grounded decision making, socially compliant navigation, sensor configuration generalization, cross-view and cross-modal correspondence, and cross-platform 3D perception. Together, these tasks form a comprehensive benchmark for evaluating real-world sensing reliability under domain shifts, sensor failures, and platform discrepancies. RoboSense 2025 provides standardized datasets, baseline models, and unified evaluation protocols, enabling large-scale and reproducible comparison of robust perception methods. The challenge attracted 143 teams from 85 institutions across 16 countries, reflecting broad community engagement. By consolidating insights from 23 winning solutions, this report highlights emerging methodological trends, shared design principles, and open challenges across all tracks, marking a step toward building robots that can sense reliably, act robustly, and adapt across platforms in real-world environments.

preprint2026arXiv

Towards Cross-Platform Generalization: Domain Adaptive 3D Detection with Augmentation and Pseudo-Labeling

This technical report represents the award-winning solution to the Cross-platform 3D Object Detection task in the RoboSense2025 Challenge. Our approach is built upon PVRCNN++, an efficient 3D object detection framework that effectively integrates point-based and voxel-based features. On top of this foundation, we improve cross-platform generalization by narrowing domain gaps through tailored data augmentation and a self-training strategy with pseudo-labels. These enhancements enabled our approach to secure the 3rd place in the challenge, achieving a 3D AP of 62.67% for the Car category on the phase-1 target domain, and 58.76% and 49.81% for Car and Pedestrian categories respectively on the phase-2 target domain.

preprint2022arXiv

Blockchain-assisted Undisclosed IIoT Vulnerabilities Trusted Sharing Protection with Dynamic Token

With the large-scale deployment of industrial internet of things (IIoT) devices, the number of vulnerabilities that threaten IIoT security is also growing dramatically, including a mass of undisclosed IIoT vulnerabilities that lack mitigation measures. Coordination Vulnerabilities Disclosure (CVD) is one of the most popular vulnerabilities sharing solutions, in which some security workers (SWs) can develop undisclosed vulnerabilities patches together. However, CVD assumes that sharing participants (SWs) are all honest, and thus offering chances for dishonest SWs to leak undisclosed IIoT vulnerabilities. To combat such threats, we propose an Undisclosed IIoT Vulnerabilities Trusted Sharing Protection (UIV-TSP) scheme with dynamic token. In this article, a dynamic token is an implicit access credential for an SW to acquire an undisclosed vulnerability information, which is only held by the system and constantly updated as the SW access. Meanwhile, the latest updated token can be stealthily sneaked into the acquired information as the traceability token. Once the undisclosed vulnerability information leaves the SW host, the embedded self-destruct program will be automatically triggered to prevent leaks since the destination MAC address in the traceability token has changed. To quickly distinguish dishonest SWs, trust mechanism is adopted to evaluate the trust value of SWs. Moreover, we design a blockchain-assisted continuous logs storage method to achieve the tamper-proofing of dynamic token and the transparency of undisclosed IIoT vulnerabilities sharing. The simulation results indicate that our proposed scheme is resilient to suppress dishonest SWs and protect the IoT undisclosed vulnerabilities effectively.

preprint2022arXiv

ME-Net: Multi-Encoder Net Framework for Brain Tumor Segmentation

Glioma is the most common and aggressive brain tumor. Magnetic resonance imaging (MRI) plays a vital role to evaluate tumors for the arrangement of tumor surgery and the treatment of subsequent procedures. However, the manual segmentation of the MRI image is strenuous, which limits its clinical application. With the development of deep learning, a large number of automatic segmentation methods have been developed, but most of them stay in 2D images, which leads to subpar performance. Moreover, the serious voxel imbalance between the brain tumor and the background as well as the different sizes and locations of the brain tumor makes the segmentation of 3D images a challenging problem. Aiming at segmenting 3D MRI, we propose a model for brain tumor segmentation with multiple encoders. The structure contains four encoders and one decoder. The four encoders correspond to the four modalities of the MRI image, perform one-to-one feature extraction, and then merge the feature maps of the four modalities into the decoder. This method reduces the difficulty of feature extraction and greatly improves model performance. We also introduced a new loss function named "Categorical Dice", and set different weights for different segmented regions at the same time, which solved the problem of voxel imbalance. We evaluated our approach using the online BraTS 2020 Challenge verification. Our proposed method can achieve promising results in the validation set compared to the state-of-the-art approaches with Dice scores of 0.70249, 0.88267, and 0.73864 for the intact tumor, tumor core, and enhanced tumor, respectively.

preprint2020arXiv

A Effective Carrier Phase Recovery Method in Tigth Time-Packing Fast than Nyquist Optical Communication System

We propose a new scheme that combines polybinary transformaton and corrected-BPS to compensate noise for PDM-FTN-QPSK when its accelerated factor is 0.5,which has 3.3 dB OSNR gain when phase noise is 800 kHz.

preprint2020arXiv

Hand-Gesture-Recognition Based Text Input Method for AR/VR Wearable Devices

Static and dynamic hand movements are basic way for human-machine interactions. To recognize and classify these movements, first these movements are captured by the cameras mounted on the augmented reality (AR) or virtual reality (VR) wearable devices. The hand is segmented using segmentation method and its gestures are passed to hand gesture recognition algorithm, which depends on depth-wise separable convolutional neural network for training, testing and finally running smoothly on mobile AR/VR devices, while maintaining the accuracy and balancing the load. A number of gestures are processed for identification of right gesture and to classify the gesture and ignore the all intermittent gestures. With proposed method, a user can write letters and numbers in air by just moving his/her hand in air. Gesture based operations are performed, and trajectory of hand is recorded as handwritten text. Finally, that handwritten text is processed for the text recognition.

preprint2020arXiv

Siamese Keypoint Prediction Network for Visual Object Tracking

Visual object tracking aims to estimate the location of an arbitrary target in a video sequence given its initial bounding box. By utilizing offline feature learning, the siamese paradigm has recently been the leading framework for high performance tracking. However, current existing siamese trackers either heavily rely on complicated anchor-based detection networks or lack the ability to resist to distractors. In this paper, we propose the Siamese keypoint prediction network (SiamKPN) to address these challenges. Upon a Siamese backbone for feature embedding, SiamKPN benefits from a cascade heatmap strategy for coarse-to-fine prediction modeling. In particular, the strategy is implemented by sequentially shrinking the coverage of the label heatmap along the cascade to apply loose-to-strict intermediate supervisions. During inference, we find the predicted heatmaps of successive stages to be gradually concentrated to the target and reduced to the distractors. SiamKPN performs well against state-of-the-art trackers for visual object tracking on four benchmark datasets including OTB-100, VOT2018, LaSOT and GOT-10k, while running at real-time speed.

preprint2019arXiv

MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding

Reinforcement learning is a promising approach to learning control policies for performing complex multi-agent robotics tasks. However, a policy learned in simulation often fails to guarantee even simple safety properties such as obstacle avoidance. To ensure safety, we propose multi-agent model predictive shielding (MAMPS), an algorithm that provably guarantees safety for an arbitrary learned policy. In particular, it operates by using the learned policy as often as possible, but instead uses a backup policy in cases where it cannot guarantee the safety of the learned policy. Using a multi-agent simulation environment, we show how MAMPS can achieve good performance while ensuring safety.

Wenbo Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms

Towards Cross-Platform Generalization: Domain Adaptive 3D Detection with Augmentation and Pseudo-Labeling

Blockchain-assisted Undisclosed IIoT Vulnerabilities Trusted Sharing Protection with Dynamic Token

ME-Net: Multi-Encoder Net Framework for Brain Tumor Segmentation

A Effective Carrier Phase Recovery Method in Tigth Time-Packing Fast than Nyquist Optical Communication System

Hand-Gesture-Recognition Based Text Input Method for AR/VR Wearable Devices

Siamese Keypoint Prediction Network for Visual Object Tracking

MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding