Researcher profile

Haoran Xu

Haoran Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

A Non-compact Positivity-Preserving Scheme for Parabolic PDE via Conditional Expectation

We propose a novel non-compact, positivity-preserving scheme for linear non-divergence form parabolic equations. Based on the Feynman-Kac formula, the solution is expressed as a conditional expectation of an associated diffusion process. Instead of using compact Markov chain approximations, we employ a wide stencil scheme to approximate the conditional expectation, ensuring consistency and positivity preservation. This method is effective for anisotropic diffusion with mixed derivatives, where classical schemes often fail unless the covariance matrix is diagonally dominated. A key feature of our framework is its robust treatment of boundary conditions, which avoids the accuracy loss commonly encountered in BZ and semi-Lagrangian schemes. For Dirichlet boundaries, we introduce (i) a quad-tree non-uniform stopping time scheme with O($Δt^{1/2}$) accuracy and (ii) a quad-tree uniform stopping time scheme with O($Δt$) accuracy. For Neumann boundaries, we use discrete specular reflection with O($Δt^{1/2}$) convergence, while periodic boundaries are treated using modular wrapping, achieving O($Δt$) accuracy. All analyses are conducted under the practical scaling $Δt \sim h$. Except for the uniform stopping time scheme, all schemes are explicit. The schemes are unconditionally stable and positive preserving, thanks to the probabilistic structure. To ensure consistency, a non-compact stencil is involved, which leads to the large time step constraint $Δt \sim h$. Numerical experiments confirm the predicted $L^\infty$ convergence rates for all types of boundary conditions.

preprint2026arXiv

COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control

Visual reinforcement learning (RL) suffers from poor sample efficiency due to high-dimensional observations in complex tasks. While existing works have shown that vision-language models (VLMs) can assist RL, they often focus on knowledge distillation from the VLM to RL, overlooking the potential of RL-generated interaction data to enhance the VLM. To address this, we propose COVR, a collaborative optimization framework that enables the mutual enhancement of the VLM and RL policies. Specifically, COVR fine-tunes the VLM with RL-generated data to enhance the semantic reasoning ability consistent with the target task, and uses the enhanced VLM to further guide policy learning via action priors. To improve fine-tuning efficiency, we introduce two key modules: (1) an Exploration-Driven Dynamic Filter module that preserves valuable exploration samples using adaptive thresholds based on the degree of exploration, and (2) a Return-Aware Adaptive Loss Weight module that improves the stability of training by quantifying the inconsistency of sampling actions via return signals of RL. We further design a progressive fine-tuning strategy to reduce resource consumption. Extensive experiments show that COVR achieves strong performance across various challenging visual control tasks.

preprint2026arXiv

Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation

Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving. However, existing methods often suffer from temporal inconsistencies, manifesting as flickering effects that degrade temporal coherence and adversely affect downstream decision-making. While recent approaches incorporate historical information to alleviate this issue, they often incur high computational costs and may introduce misaligned or redundant features that interfere with object detection. We propose OccLinker, a novel plugin framework that can be easily integrated into existing VONs to improve performance. Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and generates correction occupancy components to refine the base network predictions. In addition, we introduce a new temporal consistency metric to quantitatively measure flickering effects. Extensive experiments on two benchmark datasets demonstrate that our method achieves superior performance with minimal computational overhead while effectively reducing flickering artifacts.

preprint2026arXiv

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

The large-scale deployment of personalized healthcare agents demands memory mechanisms that are exceptionally precise, safe, and capable of long-term clinical tracking. However, existing benchmarks primarily focus on daily open-domain conversations, failing to capture the high-stakes complexity of real-world medical applications. Motivated by the stringent production requirements of an industry-leading health management agent serving tens of millions of active users, we introduce MedMemoryBench. We develop a human-agent collaborative pipeline to synthesize highly realistic, long-horizon medical trajectories based on clinically grounded, synthetic patient archetypes. This process yields a massive, expertly validated dataset comprising approximately 2,000 sessions and 16,000 interaction turns. Crucially, MedMemoryBench departs from traditional static evaluations by pioneering an "evaluate-while-constructing" streaming assessment protocol, which precisely mirrors dynamic memory accumulation in production environments. Furthermore, we formalize and systematically investigate the critical phenomenon of memory saturation, where sustained information influx actively degrades retrieval and reasoning robustness. Comprehensive benchmarking reveals severe bottlenecks in mainstream architectures, particularly concerning complex medical reasoning and noise resilience. By exposing these fundamental flaws, MedMemoryBench establishes a vital foundation for developing robust, production-ready medical agents.

preprint2026arXiv

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from data contamination and limited flexibility. To mitigate this, we propose World2Minecraft to convert real-world scenes into structured Minecraft environments based on 3D semantic occupancy prediction. In the reconstructed scenes, we can effortlessly perform downstream tasks such as Vision-Language Navigation(VLN). However, we observe that reconstruction quality heavily depends on accurate occupancy prediction, which remains limited by data scarcity and poor generalization in existing models. We introduce a low-cost, automated, and scalable data acquisition pipeline for creating customized occupancy datasets, and demonstrate its effectiveness through MinecraftOcc, a large-scale dataset featuring 100,165 images from 156 richly detailed indoor scenes. Extensive experiments show that our dataset provides a critical complement to existing datasets and poses a significant challenge to current SOTA methods. These findings contribute to improving occupancy prediction and highlight the value of World2Minecraft in providing a customizable and editable platform for personalized embodied AI research. Project page:https://world2minecraft.github.io/.

preprint2023arXiv

Discriminator-Guided Model-Based Offline Imitation Learning

Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data. Including a learned dynamics model can potentially improve the state-action space coverage of expert data, however, it also faces challenging issues like model approximation/generalization errors and suboptimality of rollout data. In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning strategy, which uses the discriminator to guide and couple the learning process of the policy and dynamics model, resulting in improved model performance and robustness. Our framework can also be extended to the case when demonstrations contain a large proportion of suboptimal data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.

preprint2022arXiv

Constraints Penalized Q-learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.

preprint2022arXiv

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning

Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry. We develop a new data-driven AI system, namely DeepThermal, to optimize the combustion control strategy for TPGUs. At its core, is a new model-based offline reinforcement learning (RL) framework, called MORE, which leverages historical operational data of a TGPU to solve a highly complex constrained Markov decision process problem via purely offline training. In DeepThermal, we first learn a data-driven combustion process simulator from the offline dataset. The RL agent of MORE is then trained by combining real historical data as well as carefully filtered and processed simulation data through a novel restrictive exploration scheme. DeepThermal has been successfully deployed in four large coal-fired thermal power plants in China. Real-world experiments show that DeepThermal effectively improves the combustion efficiency of TPGUs. We also report the superior performance of MORE by comparing with the state-of-the-art algorithms on the standard offline RL benchmarks.

preprint2022arXiv

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. Instead, the agent is provided with a supplementary offline dataset from suboptimal behaviors. Prior works that address this problem either require that expert data occupies the majority proportion of the offline dataset, or need to learn a reward function and perform offline reinforcement learning (RL) afterwards. In this paper, we aim to address the problem without additional steps of reward learning and offline RL training for the case when demonstrations contain a large proportion of suboptimal data. Built upon behavioral cloning (BC), we introduce an additional discriminator to distinguish expert and non-expert data. We propose a cooperation framework to boost the learning of both tasks, Based on this framework, we design a new IL algorithm, where the outputs of discriminator serve as the weights of the BC loss. Experimental results show that our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.

preprint2022arXiv

Geometrical control of interface patterning underlies active matter invasion

Interaction between active materials and the boundaries of geometrical confinement is key to many emergent phenomena in active systems. For living active matter consisting of animal cells or motile bacteria, the confinement boundary is often a deformable interface, and it has been unclear how activity-induced interface dynamics might lead to morphogenesis and pattern formation. Here we studied the evolution of bacterial active matter confined by a deformable boundary. We discovered that an ordered morphological pattern emerged at the interface characterized by periodically-spaced interfacial protrusions; behind the interfacial protrusions, bacterial swimmers self-organized into multicellular clusters displaying +1/2 nematic defects. Subsequently, a hierarchical sequence of transitions from interfacial protrusions to creeping branches allowed the bacterial active drop to rapidly invade surrounding space with a striking self-similar branch pattern. We found that this interface patterning is controlled by the local curvature of the interface, a phenomenon we denote as collective curvature sensing. Using a continuum active model, we revealed that the collective curvature sensing arises from enhanced active stresses near high-curvature regions, with the active length-scale setting the characteristic distance between the interfacial protrusions. Our findings reveal a protrusion-to-branch transition as a novel mode of active matter invasion and suggest a new strategy to engineer pattern formation of active materials.

preprint2022arXiv

Model-Based Offline Planning with Trajectory Pruning

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. The model-based planning framework provides an attractive alternative. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.

preprint2022arXiv

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

The current state-of-the-art for few-shot cross-lingual transfer learning first trains on abundant labeled data in the source language and then fine-tunes with a few examples on the target language, termed target-adapting. Though this has been demonstrated to work on a variety of tasks, in this paper we show some deficiencies of this approach and propose a one-step mixed training method that trains on both source and target data with \textit{stochastic gradient surgery}, a novel gradient-level optimization. Unlike the previous studies that focus on one language at a time when target-adapting, we use one model to handle all target languages simultaneously to avoid excessively language-specific models. Moreover, we discuss the unreality of utilizing large target development sets for model selection in previous literature. We further show that our method is both development-free for target languages, and is also able to escape from overfitting issues. We conduct a large-scale experiment on 4 diverse NLP tasks across up to 48 languages. Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin, especially for languages that are linguistically distant from the source language, e.g., 7.36% F1 absolute gain on average for the NER task, up to 17.60% on Punjabi.