Source author record

Jiawei Xu

Jiawei Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Computer Vision Machine Learning Computation and Language Cryptography and Security eess.IV eess.SY Human-Computer Interaction Multiagent Systems Systems and Control

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EponaV2: Driving World Model with Comprehensive Future Reasoning

Data scaling plays a pivotal role in the pursuit of general intelligence. However, the prevailing perception-planning paradigm in autonomous driving relies heavily on expensive manual annotations to supervise trajectory planning, which severely limits its scalability. Conversely, although existing perception-free driving world models achieve impressive driving performance, their real-world reasoning ability for planning is solely built on next frame image forecasting. Due to the lack of enough supervision, these models often struggle with comprehensive scene understanding, resulting in unsatisfactory trajectory planning. In this paper, we propose EponaV2, a novel paradigm of driving world models, which achieves high-quality planning with comprehensive future reasoning. Inspired by how human drivers anticipate 3D geometry and semantics, we train our model to forecast more comprehensive future representations, which can be additionally decoded to future geometry and semantic maps. Extracting the 3D and semantic modalities enables our model to deeply understand the surrounding environment, and the future prediction task significantly enhances the real-world reasoning capabilities of EponaV2, ultimately leading to improved trajectory planning. Moreover, inspired by the training recipe of Large Language Models (LLMs), we introduce a flow matching group relative policy optimization mechanism to further improve planning accuracy. The state-of-the-art (SOTA) performances of EponaV2 among perception-free models on three NAVSIM benchmarks (+1.3PDMS, +5.5EPDMS) demonstrate the effectiveness of our methods.

preprint2026arXiv

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

Recent advances in LLM-based multi-agent systems (MAS) show that workflows composed of multiple LLM agents with distinct roles, tools, and communication patterns can outperform single-LLM baselines on complex tasks. However, most frameworks are homogeneous, where all agents share the same base LLM and differ only in prompts, tools, and positions in the workflow. This raises the question of whether such workflows can be simulated by a single agent through multi-turn conversations. We investigate this across seven benchmarks spanning coding, mathematics, general question answering, domain-specific reasoning, and real-world planning and tool use. Our results show that a single agent can reach the performance of homogeneous workflows with an efficiency advantage from KV cache reuse, and can even match the performance of an automatically optimized heterogeneous workflow. Building on this finding, we propose \textbf{OneFlow}, an algorithm that automatically tailors workflows for single-agent execution, reducing inference costs compared to existing automatic multi-agent design frameworks without trading off accuracy. These results position the single-LLM implementation of multi-agent workflows as a strong baseline for MAS research. We also note that single-LLM methods cannot capture heterogeneous workflows due to the lack of KV cache sharing across different LLMs, highlighting future opportunities in developing \textit{truly} heterogeneous multi-agent systems.

preprint2022arXiv

Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning

The existing generative adversarial fusion methods generally concatenate source images and extract local features through convolution operation, without considering their global characteristics, which tends to produce an unbalanced result and is biased towards the infrared image or visible image. Toward this end, we propose a novel end-to-end mode based on generative adversarial training to achieve better fusion balance, termed as \textit{interactive compensatory attention fusion network} (ICAFusion). In particular, in the generator, we construct a multi-level encoder-decoder network with a triple path, and adopt infrared and visible paths to provide additional intensity and gradient information. Moreover, we develop interactive and compensatory attention modules to communicate their pathwise information, and model their long-range dependencies to generate attention maps, which can more focus on infrared target perception and visible detail characterization, and further increase the representation power for feature extraction and feature reconstruction. In addition, dual discriminators are designed to identify the similar distribution between fused result and source images, and the generator is optimized to produce a more balanced result. Extensive experiments illustrate that our ICAFusion obtains superior fusion performance and better generalization ability, which precedes other advanced methods in the subjective visual description and objective metric evaluation. Our codes will be public at \url{https://github.com/Zhishe-Wang/ICAFusion}

preprint2022arXiv

PogoDrone: Design, Model, and Control of a Jumping Quadrotor

We present a design, model, and control for a novel jumping-flying robot that is called PogoDrone. The robot is composed of a quadrotor with a passive mechanism for jumping. The robot can continuously jump in place or fly like a normal quadrotor. Jumping in place allows the robot to quickly move and operate very close to the ground. For instance, in agricultural applications, the jumping mechanism allows the robot to take samples of soil. We propose a hybrid controller that switches from attitude to position control to allow the robot to fall horizontally and recover to the original position. We compare the jumping mode with the hovering mode to analyze the energy consumption. In simulations, we evaluate the effect of different factors on energy consumption. In real experiments, we show that our robot can repeatedly impact the ground, jump, and fly in a physical environment.

preprint2022arXiv

SeqNet: An Efficient Neural Network for Automatic Malware Detection

Malware continues to evolve rapidly, and more than 450,000 new samples are captured every day, which makes manual malware analysis impractical. However, existing deep learning detection models need manual feature engineering or require high computational overhead for long training processes, which might be laborious to select feature space and difficult to retrain for mitigating model aging. Therefore, a crucial requirement for a detector is to realize automatic and efficient detection. In this paper, we propose a lightweight malware detection model called SeqNet which could be trained at high speed with low memory required on the raw binaries. By avoiding contextual confusion and reducing semantic loss, SeqNet maintains the detection accuracy when reducing the number of parameters to only 136K. We demonstrate the effectiveness of our methods and the low training cost requirement of SeqNet in our experiments. Besides, we make our datasets and codes public to stimulate further academic research.

preprint2021arXiv

H-ModQuad: Modular Multi-Rotors with 4, 5, and 6 Controllable DOF

Traditional aerial vehicles are usually custom-designed for specific tasks. Although they offer an efficient solution, they are not always able to adapt to changes in the task specification, e.g., increasing the payload. This applies to quadrotors, having a maximum payload and only four controllable degrees of freedom, limiting their adaptability to the task's variations. We propose a versatile modular robotic system that can increase its payload and degrees of freedom by assembling heterogeneous modules; we call it H-ModQuad. It consists of cuboid modules propelled by quadrotors with tilted propellers that can generate forces in different directions. By connecting different types of modules, an H-ModQuad can increase its controllable degrees of freedom from 4 to 5 and 6. We model the general structure and propose three controllers, one for each number of controllable degrees of freedom. We extend the concept of the actuation ellipsoid to find the best reference orientation that can maximize the performance of the structure. Our approach is validated with experiments using actual robots, showing the independence of the translation and orientation of a structure.

preprint2014arXiv

A Cognitive Model for Humanoid Robot Navigation and Mapping using Alderbaran NAO

The aim of this work is to build a cognitive model for the humanoid robot, especially, we are interested in the navigation and mapping on the humanoid robot. The agents used are the Alderbaran NAO robot. The framework is effectively applied to the integration of AI, computer vision, and signal processing problems. Our model can be divided into two parts, cognitive mapping and perception. Cognitive mapping is assumed as three parts, whose representations were proposed a network of ASRs, an MFIS, and a hierarchy of Place Representations. On the other hand, perception is the traditional computer vision problem, which is the image sensing, feature extraction and interested objects tracking. The points of our project can be concluded as the following. Firstly, the robotics should realize where it is. Second, we would like to test the theory that this is how humans map their environment. The humanoid robot inspires the human vision searching by integrating the visual mechanism and computer vision techniques.

preprint2014arXiv

Perceiving Motion Cues Inspired by Microsoft Kinect Sensor on Game Experiencing

This paper proposed a novel method to replace the traditional mouse controller by using Microsoft Kinect Sensor to realize the functional implementation on human-machine interaction. With human hand gestures and movements, Kinect Sensor could accurately recognize the participants intention and transmit our order to desktop or laptop. In addition, the trend in current HCI market is giving the customer more freedom and experiencing feeling by involving human cognitive factors more deeply. Kinect sensor receives the motion cues continuously from the humans intention and feedback the reaction during the experiments. The comparison accuracy between the hand movement and mouse cursor demonstrates the efficiency for the proposed method. In addition, the experimental results on hit rate in the game of Fruit Ninja and Shape Touching proves the real-time ability of the proposed framework. The performance evaluation built up a promise foundation for the further applications in the field of human-machine interaction. The contribution of this work is the expansion on hand gesture perception and early formulation on Mac iPad.

Jiawei Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

EponaV2: Driving World Model with Comprehensive Future Reasoning

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline

Infrared and Visible Image Fusion via Interactive Compensatory Attention Adversarial Learning

PogoDrone: Design, Model, and Control of a Jumping Quadrotor

SeqNet: An Efficient Neural Network for Automatic Malware Detection

H-ModQuad: Modular Multi-Rotors with 4, 5, and 6 Controllable DOF

A Cognitive Model for Humanoid Robot Navigation and Mapping using Alderbaran NAO

Perceiving Motion Cues Inspired by Microsoft Kinect Sensor on Game Experiencing