Researcher profile

Jiahong Wu

Jiahong Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

DocOS: Towards Proactive Document-Guided Actions in GUI Agents

While Graphical User Interface (GUI) agents have shown promising performance in automated device interaction, they primarily depend on static parametric knowledge from pre-training or instruction tuning. This reliance fundamentally limits their ability to handle long-tailed tasks that require explicit procedural knowledge absent from model parameters, often forcing agents to resort to inefficient and brittle trial-and-error exploration. To mitigate this limitation, we introduce \textbf{Proactive Document-Guided Action} for GUI agents in dynamic, open-web environments, a novel paradigm that mirrors human problem-solving by enabling agents to autonomously search for relevant documentation to resolve long-tailed tasks. To evaluate agents' capability in this paradigm, we propose \textbf{DocOS}, a benchmark designed to assess document-guided problem solving in fully interactive environments. DocOS requires agents to autonomously navigate a web browser, locate relevant online documentation, comprehend procedural instructions, and faithfully ground them into executable GUI actions. Extensive experiments reveal that progress is strictly constrained by dual bottlenecks: agents struggle to reliably locate relevant information during proactive search and frequently fail to faithfully ground retrieved instructions into precise actions, pointing toward document-guided interaction as a crucial pathway for enabling self-evolving GUI agents in dynamic environments.

preprint2026arXiv

Embedding-perturbed Exploration Preference Optimization for Flow Models

Recent advancements have established Reinforcement Learning (RL) as a pivotal paradigm for aligning generative models with human intent. However, group-based optimization frameworks (e.g., GRPO) face a critical limitation: the rapid decay of intra-group variance. As the distinctiveness among samples within a group diminishes, the variance approaches zero. This eliminates the very learning signal required for optimization, rendering the process unstable and forcing the policy into premature stagnation or reward hacking. Existing strategies, such as varying the initial noise or increasing group sizes, often fail to address this fundamental issue, resulting in training instability or diminishing returns. To overcome these challenges, we propose $\textbf{Embedding-perturbed Exploration Preference Optimization (}E^2\textbf{PO)}$, a novel framework that sustains optimization through embedding-level perturbation. Our method introduces structured, embedding-level perturbations within sample groups, guaranteeing a robust variance that preserves the discriminative signal throughout the training process. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving a more faithful alignment with human preference.

preprint2025arXiv

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding. However, they suffer from a critical vulnerability: an over-reliance on language priors, which can lead to visual ungrounded hallucinations, especially when processing counterfactual videos that defy common sense. This limitation, stemming from the intrinsic data imbalance between text and video, is challenging to address due to the substantial cost of collecting and annotating counterfactual data. To address this, we introduce DualityForge, a novel counterfactual data synthesis framework that employs controllable, diffusion-based video editing to transform real-world videos into counterfactual scenarios. By embedding structured contextual information into the video editing and QA generation processes, the framework automatically produces high-quality QA pairs together with original-edited video pairs for contrastive training. Based on this, we build DualityVidQA, a large-scale video dataset designed to reduce MLLM hallucinations. In addition, to fully exploit the contrastive nature of our paired data, we propose Duality-Normalized Advantage Training (DNA-Train), a two-stage SFT-RL training regime where the RL phase applies pair-wise $\ell_1$ advantage normalization, thereby enabling a more stable and efficient policy optimization. Experiments on DualityVidQA-Test demonstrate that our method substantially reduces model hallucinations on counterfactual videos, yielding a relative improvement of 24.0% over the Qwen2.5-VL-7B baseline. Moreover, our approach achieves significant gains across both hallucination and general-purpose benchmarks, indicating strong generalization capability. We will open-source our dataset and code.

preprint2023arXiv

Calmed 3D Navier-Stokes Equations: Global Well-Posedness, Energy Identities, Global Attractors, and Convergence

We propose a modification to the nonlinear term of the three-dimensional incompressible Navier-Stokes equations (NSE) in either advective or rotational form which "calms" the system in the sense that the algebraic degree of the nonlinearity is effectively reduced. This system, the calmed Navier-Stokes Equations (calmed NSE), utilizes a "calming function" in the nonlinear term to locally constrain large advective velocities. Notably, this approach avoids the direct smoothing or filtering of derivatives, thus we make no modifications to the boundary conditions. Under suitable conditions on the calming function, we are able to prove global well-posedness of calmed NSE and show the convergence of calmed NSE solutions to NSE solutions on the time interval of existence for the latter. In addition, we prove that the dynamical system generated by the calmed NSE in the rotational form possesses both an energy identity and a global attractor. Moreover, we show that strong solutions to the calmed equations converge to strong solutions of the NSE without assuming their existence, providing a new proof of the existence of strong solutions to the 3D Navier-Stokes equations.

preprint2022arXiv

Global well-posedness for 2D non-resistive compressible MHD system in periodic domain

This paper focuses on the 2D compressible magnetohydrodynamic (MHD) equations without magnetic diffusion in a periodic domain. We present a systematic approach to establishing the global existence of smooth solutions when the initial data is close to a background magnetic field. In addition, stability and large-time decay rates are also obtained. When there is no magnetic diffusion, the magnetic field and the density are governed by forced transport equations and the problem considered here is difficult. This paper implements several key observations and ideas to maximize the enhanced dissipation due to hidden structures and interactions. In particular, the weak smoothing and stabilization generated by the background magnetic field and the extra regularization in the divergence part of the velocity field are fully exploited. Compared with the previous works, this paper appears to be the first to investigate such system on bounded domains and the first to solve this problem by pure energy estimates, which help reduce the complexity in other approaches. In addition, this paper combines the well-posedness with the precise large-time behavior, a strategy that can be extended to higher dimensions.

preprint2020arXiv

AinnoSeg: Panoramic Segmentation with High Perfomance

Panoramic segmentation is a scene where image segmentation tasks is more difficult. With the development of CNN networks, panoramic segmentation tasks have been sufficiently developed.However, the current panoramic segmentation algorithms are more concerned with context semantics, but the details of image are not processed enough. Moreover, they cannot solve the problems which contains the accuracy of occluded object segmentation,little object segmentation,boundary pixel in object segmentation etc. Aiming to address these issues, this paper presents some useful tricks. (a) By changing the basic segmentation model, the model can take into account the large objects and the boundary pixel classification of image details. (b) Modify the loss function so that it can take into account the boundary pixels of multiple objects in the image. (c) Use a semi-supervised approach to regain control of the training process. (d) Using multi-scale training and reasoning. All these operations named AinnoSeg, AinnoSeg can achieve state-of-art performance on the well-known dataset ADE20K.

preprint2020arXiv

Global regularity of the three-dimensional fractional micropolar equations

The global well-posedness of the smooth solution to the three-dimensional (3D) incompressible micropolar equations is a difficult open problem. This paper focuses on the 3D incompressible micropolar equations with fractional dissipations $( Δ)^αu$ and $(-Δ)^βw$.Our objective is to establish the global regularity of the fractional micropolar equations with the minimal amount of dissipations. We prove that, if $α\geq \frac{5}{4}$, $β\geq 0$ and $α+β\geq\frac{7}{4}$, the fractional 3D micropolar equations always possess a unique global classical solution for any sufficiently smooth data. In addition, we also obtain the global regularity of the 3D micropolar equations with the dissipations given by Fourier multipliers that are logarithmically weaker than the fractional Laplacian.

preprint2020arXiv

Provenance-based Classification Policy based on Encrypted Search

As an important type of cloud data, digital provenance is arousing increasing attention on improving system performance. Currently, provenance has been employed to provide cues regarding access control and to estimate data quality. However, provenance itself might also be sensitive information. Therefore, provenance might be encrypted and stored in the Cloud. In this paper, we provide a mechanism to classify cloud documents by searching specific keywords from their encrypted provenance, and we prove our scheme achieves semantic security. In term of application of the proposed techniques, considering that files are classified to store separately in the cloud, in order to facilitate the regulation and security protection for the files, the classification policies can use provenance as conditions to determine the category of a document. Such as the easiest sample policy goes like: the documents have been reviewed twice can be classified as "public accessible", which can be accessed by the public.

preprint2020arXiv

Stability of Couette flow for 2D Boussinesq system with vertical dissipation

This paper establishes the nonlinear stability of the Couette flow for the 2D Boussinesq equations with only vertical dissipation. The Boussinesq equations concerned here model buoyancy-driven fluids such as atmospheric and oceanographic flows. Due to the presence of the buoyancy forcing, the energy of the standard Boussinesq equations could grow in time. It is the enhanced dissipation created by the linear non-self-adjoint operator $y\partial_x -ν\partial_{yy}$ in the perturbation equation that makes the nonlinear stability possible. When the initial perturbation from the Couette flow $(y, 0)$ is no more than the viscosity to a suitable power (in the Sobolev space $H^b$ with $b>\frac43$), we prove that the solution of the 2D Boussnesq system with only vertical dissipation on $\mathbb T\times \mathbb R$ remains close to the Couette at the same order. A special consequence of this result is the stability of the Couette for the 2D Navier-Stokes equations with only vertical dissipation.

preprint2020arXiv

The stabilizing effect of the temperature on buoyancy-driven fluids

The Boussinesq system for buoyancy driven fluids couples the momentum equation forced by the buoyancy with the convection-diffusion equation for the temperature. One fundamental issue on the Boussinesq system is the stability problem on perturbations near the hydrostatic balance. This problem can be extremely difficult when the system lacks full dissipation. This paper solves the stability problem for a two-dimensional Boussinesq system with only vertical dissipation and horizontal thermal diffusion. We establish the stability for the nonlinear system and derive precise large-time behavior for the linearized system. The results presented in this paper reveal a remarkable phenomenon for buoyancy driven fluids. That is, the temperature actually smooths and stabilizes the fluids. If the temperature were not present, the fluid is governed by the 2D Navier-Stokes with only vertical dissipation and its stability remains open. It is the coupling and interaction between the temperature and the velocity in the Boussinesq system that makes the stability problem studied here possible. Mathematically the system can be reduced to degenerate and damped wave equations that fuel the stabilization.

preprint2018arXiv

A Large-scale Attribute Dataset for Zero-shot Learning

Zero-Shot Learning (ZSL) has attracted huge research attention over the past few years; it aims to learn the new concepts that have never been seen before. In classical ZSL algorithms, attributes are introduced as the intermediate semantic representation to realize the knowledge transfer from seen classes to unseen classes. Previous ZSL algorithms are tested on several benchmark datasets annotated with attributes. However, these datasets are defective in terms of the image distribution and attribute diversity. In addition, we argue that the "co-occurrence bias problem" of existing datasets, which is caused by the biased co-occurrence of objects, significantly hinders models from correctly learning the concept. To overcome these problems, we propose a Large-scale Attribute Dataset (LAD). Our dataset has 78,017 images of 5 super-classes, 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets. 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level. We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset. The experimental results reveal the challenge of implementing zero-shot learning on our dataset.

preprint2017arXiv

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding

Significant progress has been achieved in Computer Vision by leveraging large-scale image datasets. However, large-scale datasets for complex Computer Vision tasks beyond classification are still limited. This paper proposed a large-scale dataset named AIC (AI Challenger) with three sub-datasets, human keypoint detection (HKD), large-scale attribute dataset (LAD) and image Chinese captioning (ICC). In this dataset, we annotate class labels (LAD), keypoint coordinate (HKD), bounding box (HKD and LAD), attribute (LAD) and caption (ICC). These rich annotations bridge the semantic gap between low-level images and high-level concepts. The proposed dataset is an effective benchmark to evaluate and improve different computational methods. In addition, for related tasks, others can also use our dataset as a new resource to pre-train their models.