Source author record

Wenjie Lu

Wenjie Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Systems and Control Machine Learning Robotics Artificial Intelligence gr-qc Information Theory math.IT

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents

Graphical User Interface (GUI) agents aim to automate a wide spectrum of human tasks by emulating user interaction. Despite rapid advancements, current approaches are hindered by several critical challenges: data bottleneck in end-to-end training, high cost of delayed error detection, and risk of contradictory guidance. Inspired by the human cognitive loop of Thinking, Alignment, and Reflection, we present D-Artemis -- a novel deliberative framework in this paper. D-Artemis leverages a fine-grained, app-specific tip retrieval mechanism to inform its decision-making process. It also employs a proactive Pre-execution Alignment stage, where Thought-Action Consistency (TAC) Check module and Action Correction Agent (ACA) work in concert to mitigate the risk of execution failures. A post-execution Status Reflection Agent (SRA) completes the cognitive loop, enabling strategic learning from experience. Crucially, D-Artemis enhances the capabilities of general-purpose Multimodal large language models (MLLMs) for GUI tasks without the need for training on complex trajectory datasets, demonstrating strong generalization. D-Artemis establishes new state-of-the-art (SOTA) results across both major benchmarks, achieving a 75.8% success rate on AndroidWorld and 96.8% on ScreenSpot-V2. Extensive ablation studies further demonstrate the significant contribution of each component to the framework.

preprint2022arXiv

Non-cascaded Control Barrier Functions for the Safe Control of Quadrotors

Researchers have developed various cascaded controllers and non-cascaded controllers for the navigation and control of quadrotors in recent years. It is vital to ensure the safety of a quadrotor both in normal state and in abnormal state if a controller tends to make the quadrotor unsafe. To this end, this paper proposes a non-cascaded Control Barrier Function (CBF) for a quadrotor controlled by either cascaded controllers or a non-cascaded controller. Incorporated with a Quadratic Programming (QP), the non-cascaded CBF can simultaneously regulate the magnitude of the total thrust and the torque of the quadrotor determined a controller, so as to ensure the safety of the quadrotor both in normal state and in abnormal state. The non-cascaded CBF establishes a non-conservative forward invariant safe region, in which the controller of a quadrotor is fully or partially effective in the navigation or the pose control of the quadrotor. The non-cascaded CBF is applied to a quadrotor performing trajectory tracking and a quadrotor performing aggressive roll maneuvers in simulations to evaluate the effectiveness of the non-cascaded CBF.

preprint2022arXiv

Safe Reinforcement Learning for a Robot Being Pursued but with Objectives Covering More Than Capture-avoidance

Reinforcement Learning (RL) algorithms show amazing performance in recent years, but placing RL in real-world applications such as self-driven vehicles may suffer safety problems. A self-driven vehicle moving to a target position following a learned policy may suffer a vehicle with unpredictable aggressive behaviors or even being pursued by a vehicle following a Nash strategy. To address the safety issue of the self-driven vehicle in this scenario, this paper conducts a preliminary study based on a system of robots. A safe RL framework with safety guarantees is developed for a robot being pursued but with objectives covering more than capture-avoidance. Simulations and experiments are conducted based on the system of robots to evaluate the effectiveness of the developed safe RL framework.

preprint2020arXiv

DOB-Net: Actively Rejecting Unknown Excessive Time-Varying Disturbances

This paper presents an observer-integrated Reinforcement Learning (RL) approach, called Disturbance OBserver Network (DOB-Net), for robots operating in environments where disturbances are unknown and time-varying, and may frequently exceed robot control capabilities. The DOB-Net integrates a disturbance dynamics observer network and a controller network. Originated from conventional DOB mechanisms, the observer is built and enhanced via Recurrent Neural Networks (RNNs), encoding estimation of past values and prediction of future values of unknown disturbances in RNN hidden state. Such encoding allows the controller generate optimal control signals to actively reject disturbances, under the constraints of robot control capabilities. The observer and the controller are jointly learned within policy optimization by advantage actor critic. Numerical simulations on position regulation tasks have demonstrated that the proposed DOB-Net significantly outperforms a conventional feedback controller and classical RL algorithms.

preprint2020arXiv

Modular Transfer Learning with Transition Mismatch Compensation for Excessive Disturbance Rejection

Underwater robots in shallow waters usually suffer from strong wave forces, which may frequently exceed robot's control constraints. Learning-based controllers are suitable for disturbance rejection control, but the excessive disturbances heavily affect the state transition in Markov Decision Process (MDP) or Partially Observable Markov Decision Process (POMDP). Also, pure learning procedures on targeted system may encounter damaging exploratory actions or unpredictable system variations, and training exclusively on a prior model usually cannot address model mismatch from the targeted system. In this paper, we propose a transfer learning framework that adapts a control policy for excessive disturbance rejection of an underwater robot under dynamics model mismatch. A modular network of learning policies is applied, composed of a Generalized Control Policy (GCP) and an Online Disturbance Identification Model (ODI). GCP is first trained over a wide array of disturbance waveforms. ODI then learns to use past states and actions of the system to predict the disturbance waveforms which are provided as input to GCP (along with the system state). A transfer reinforcement learning algorithm using Transition Mismatch Compensation (TMC) is developed based on the modular architecture, that learns an additional compensatory policy through minimizing mismatch of transitions predicted by the two dynamics models of the source and target tasks. We demonstrated on a pose regulation task in simulation that TMC is able to successfully reject the disturbances and stabilize the robot under an empirical model of the robot system, meanwhile improve sample efficiency.

preprint2014arXiv

An Information Value Function for Nonparametric Gaussian Processes

This paper presents a novel information value function that can be used in online sensor planning to monitor a spatial phenomenon in which the spatial phenomenon is modeled by nonparametric Gaussian processes. The information value function is derived from the Kullback-Leibler (KL) divergence and represents the information value brought by sensor decision. The sensor decision at every time step is to select the sensing location that maximizes the information value function associated with the measurement taken. Gaussian processes (GPs) are employed to obtain the posterior distribution of the spatial phenomenon given a number of sensor measurements, because GPs have sufficient flexibilities to adopt the complexity from data. Furthermore, a greedy algorithm is designed based on the information value function. By comparing the greedy algorithm with the random algorithm, it is shown that the error decreases faster defined as the difference between the estimated posterior distribution and the true distribution of the spatial phenomenon via the greedy algorithm.