Researcher profile

Zheng Wu

Zheng Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Causal Probing for Internal Visual Representations in Multimodal Large Language Models

Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose a causal framework based on activation steering to actively probe and manipulate internal visual representations. Through systematic intervention across four visual concept categories, our results reveal a divergence in concept encoding: entities exhibit distinct localized memorization, whereas abstract concepts are globally distributed across the network. Critically, this divergence uncovers a mechanistic driver of scaling laws: increasing model depth is indispensable for encoding distributed and complex abstract concepts, whereas entity localization remains remarkably invariant to scale. Furthermore, reverse steering uncovers that blocking explicit output triggers a surge in latent activations, exposing a compensatory mechanism between perception and generation. Finally, extending our analysis to visual reasoning, we expose a disconnect between perception and reasoning although MLLMs successfully recognize geometric relations, they treat them merely as static visual features, failing to trigger the procedural execution necessary for abstract problem-solving.

preprint2026arXiv

Faithful Mobile GUI Agents with Guided Advantage Estimator

Vision-language model based graphical user interface (GUI) agents have shown strong interaction capabilities. However, they often behave unfaithfully, relying on memorized shortcuts rather than grounding actions in displayed screen evidence or user instructions. To address this, we propose Faithful-Agent, a faithfulness-first framework that reformulates GUI interaction to prioritize evidence groundedness and internal consistency. Faithful-Agent employs a two-stage pipeline: (i) a faithfulness-oriented SFT stage to instill abstainment behaviors under evidence perturbations; (ii) an RFT stage that further amplifies faithfulness by introducing the guided advantage estimator (GuAE), an anchor-based and variance-adaptive advantage tempering mechanism built upon GRPO. GuAE prevents advantage collapse in low-variance rollout groups under sparse GUI rewards, and with a thought-action consistency reward, Faithful-Agent (Stage II) elevates the Trap SR from 13.88\% to 80.21\% relative to the baseline, while preserving robust general instruction-following performance.

preprint2022arXiv

Extreme-mass-ratio burst detection with TianQin

The capture of compact objects by massive black holes in galaxies or dwarf galaxies will generate short gravitational wave signals, called extreme-mass-ratio bursts (EMRBs), before evolving into extreme-mass-ratio inspirals. Their detection will provide an investigation of the black hole properties and shed light on astronomy and astrophysics. In this work, we investigate the detection number of the TianQin observatory on EMRBs. Our result shows that TianQin can detect tens of EMRBs events during its mission lifetime. For those detected events, we use the Fisher information matrix to quantify these uncertainties in the inference of their parameters. We consider the possible network of TianQin+LISA and study how a network can improve parameter estimation. The result shows that, for most sources, the CO mass, the MBH mass, and the MBH spin can be determined with an accuracy of the order $10^{-1}$ and the sky localization can be determined with an accuracy of 10 square degrees. We further explore the gravitational wave background generated by those unsolved EMRBs and conclude that it is about $10^6$ times weaker than TianQin's sensitivity and thus it can be ignored.

preprint2022arXiv

Offline-Online Learning of Deformation Model for Cable Manipulation with Graph Neural Networks

Manipulating deformable linear objects by robots has a wide range of applications, e.g., manufacturing and medical surgery. To complete such tasks, an accurate dynamics model for predicting the deformation is critical for robust control. In this work, we deal with this challenge by proposing a hybrid offline-online method to learn the dynamics of cables in a robust and data-efficient manner. In the offline phase, we adopt Graph Neural Network (GNN) to learn the deformation dynamics purely from the simulation data. Then a linear residual model is learned in real-time to bridge the sim-to-real gap. The learned model is then utilized as the dynamics constraint of a trust region based Model Predictive Controller (MPC) to calculate the optimal robot movements. The online learning and MPC run in a closed-loop manner to robustly accomplish the task. Finally, comparative results with existing methods are provided to quantitatively show the effectiveness and robustness.

preprint2020arXiv

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

In the past decades, we have witnessed significant progress in the domain of autonomous driving. Advanced techniques based on optimization and reinforcement learning (RL) become increasingly powerful at solving the forward problem: given designed reward/cost functions, how should we optimize them and obtain driving policies that interact with the environment safely and efficiently. Such progress has raised another equally important question: \emph{what should we optimize}? Instead of manually specifying the reward functions, it is desired that we can extract what human drivers try to optimize from real traffic data and assign that to autonomous vehicles to enable more naturalistic and transparent interaction between humans and intelligent agents. To address this issue, we present an efficient sampling-based maximum-entropy inverse reinforcement learning (IRL) algorithm in this paper. Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers. We evaluate the proposed algorithm on real driving data, including both non-interactive and interactive scenarios. The experimental results show that the proposed algorithm achieves more accurate prediction performance with faster convergence speed and better generalization compared to other baseline IRL algorithms.

preprint2020arXiv

Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important. Human behavior is naturally rich and diverse. Cost/reward learning, as an efficient way to learn and represent human behavior, has been successfully applied in many domains. Most of traditional inverse reinforcement learning (IRL) algorithms, however, cannot adequately capture the diversity of human behavior since they assume that all behavior in a given dataset is generated by a single cost function.In this paper, we propose a probabilistic IRL framework that directly learns a distribution of cost functions in continuous domain. Evaluations on both synthetic data and real human driving data are conducted. Both the quantitative and subjective results show that our proposed framework can better express diverse human driving behaviors, as well as extracting different driving styles that match what human participants interpret in our user study.

preprint2020arXiv

Novel polymorphic phase of BaCu2As2: impact of flux for new phase formation in crystal growth

In this work, we have thoroughly studied the effects of flux composition and temperature on the crystal growth of the BaCu2As2 compound. While Pb and CuAs self-flux produce the well-known α-phase ThCr2Si2-type structure (Z=2), a new polymorphic phase of BaCu2As2 (\b{eta} phase) with a much larger c lattice parameter (Z=10), which could be considered an intergrowth of the ThCr2Si2- and CaBe2Ge2-type structures, has been discovered via Sn flux growth. We have characterized this structure through single-crystal X-ray diffraction, transmission electron microscopy (TEM), and scanning transmission electron microscopy (STEM) studies. Furthermore, we compare this new polymorphic intergrowth structure with the α-phase BaCu2As2 (ThCr2Si2 type with Z=2) and the \b{eta}-phase BaCu2Sb2 (intergrowth of ThCr2Si2 and CaBe2Ge2 types with Z=6), both with the same space group I4/mmm. Electrical transport studies reveal p-type carriers and magnetoresistivity up to 22% at 5 K and under a magnetic field of 7 T. Our work suggests a new route for the discovery of new polymorphic structures through flux and temperature control during material synthesis.

preprint2020arXiv

Orbital selectivity of layer resolved tunneling on iron superconductor Ba0.6K0.4Fe2As2

We use scanning tunneling microscopy/spectroscopy (STM/S) to elucidate the Cooper pairing of the iron pnictide superconductor Ba0.6K0.4Fe2As2. By a cold-cleaving technique, we obtain atomically resolved termination surfaces with different layer identities. Remarkably, we observe that the low-energy tunneling spectrum related to superconductivity has an unprecedented dependence on the layer-identity. By cross-referencing with the angle-revolved photoemission results and the tunneling data of LiFeAs, we find that tunneling on each termination surface probes superconductivity through selecting distinct Fe-3d orbitals. These findings imply the real-space orbital features of the Cooper pairing in the iron pnictide superconductors, and propose a new and general concept that, for complex multi-orbital material, tunneling on different terminating layers can feature orbital selectivity.