Researcher profile

Wenlong Zhang

Wenlong Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Earth Science Foundation Models: From Perception to Reasoning and Discovery

Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks spanning diverse Earth science tasks and modalities. We further discuss key challenges in multimodal data heterogeneity, scientific reliability and continual updating, scalability and sustainability, and the transition from foundation models to agentic and embodied Earth intelligence, and outline future directions toward more integrated, trustworthy, and actionable AI Earth scientists. Overall, this paper offers a structured roadmap for understanding the development of Earth foundation models from both capability depth and application breadth.

preprint2026arXiv

Earth-o1: A Grid-free Observation-native Atmospheric World Model

Despite the unprecedented volume of multimodal data provided by modern Earth observation systems, our ability to model atmospheric dynamics remains constrained. Traditional modeling frameworks force heterogeneous measurements into predefined spatial grids, inherently limiting the full exploitation of raw sensor data and creating severe computational bottlenecks. Here we present Earth-o1, an observation-native atmospheric world model that overcomes these structural limitations. Rather than relying on conventional atmospheric dynamical modeling systems or traditional data assimilation, Earth-o1 directly learns the continuous, three-dimensional physical evolution of the Earth system from ungridded observational data. By integrating diverse sensor inputs into a unified, grid-free dynamical field, the model autonomously advances the atmospheric state in space and time. We show that this fundamentally distinct paradigm enables direct, real-time forecasting and cross-sensor inference without the overhead of explicit numerical solvers. In hindcast evaluations, Earth-o1 achieves surface forecast skill comparable to the operational Integrated Forecasting System (IFS). These results establish that continuous, observation-driven world models -- a new class of fully observation-native geophysical simulators -- can match the fidelity of established physical frameworks, providing a scalable data-driven foundation for a digital twin of the Earth.

preprint2026arXiv

FlowSearch: Advancing deep research with dynamic structured knowledge flow

Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for agentic systems. To address this, we propose FlowSearch, a multi-agent framework that actively constructs and evolves a dynamic structured knowledge flow to drive subtask execution and reasoning. FlowSearch is capable of strategically planning and expanding the knowledge flow to enable parallel exploration and hierarchical task decomposition, while also adjusting the knowledge flow in real time based on feedback from intermediate reasoning outcomes and insights. FlowSearch achieves competitive performance on both general and scientific benchmarks, including GAIA, HLE, GPQA and TRQA, demonstrating its effectiveness in multi-disciplinary research scenarios and its potential to advance scientific discovery. The code is available at https://github.com/InternScience/InternAgent.

preprint2026arXiv

PICABench: How Far Are We from Physically Realistic Image Editing?

Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc.). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.

preprint2026arXiv

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

Large language models can fail in critic interaction not only by answering incorrectly, but also by abandoning an initially correct scientific solution after user criticism. This is especially risky in scientific reasoning, where user criticism can turn a valid answer into an incorrect one. We frame critic interaction as an inter-turn correctness-transition problem rather than a final-answer accuracy problem, and identify three challenges: transition awareness, decoupling useful correction from harmful sycophancy, and scalable rollout. We propose ReCrit, a transition-aware reinforcement learning framework that decomposes Initial-to-Critic behavior into four quadrants: Correction, Sycophancy, Robustness, and Boundary. ReCrit rewards correction and robustness, penalizes sycophancy, and treats persistent errors as weak boundary signals. To make interaction training practical, ReCrit further uses dynamic asynchronous rollout with tail-adaptive completion to reduce rollout waiting. On three scientific reasoning benchmarks, ChemBench, TRQA, and EarthSE, ReCrit improves average Critic accuracy from 38.15 to 51.49 on Qwen3.5-4B and from 45.40 to 55.59 on Qwen3.5-9B. Ablations show that final-answer rewards provide little interaction-level gain, while transition-aware rewards and quadrant weighting produce more distinguishable training signals and larger net Critic-stage improvement. The code is available at https://github.com/black-yt/ReCrit .

preprint2026arXiv

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the core competencies of scientific intelligence, including Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding, Scientific Symbolic Reasoning, Scientific Code Generation, Science Hypothesis Generation and Scientific Knowledge Understanding. It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science. SciEvalKit builds a foundation of expert-grade scientific benchmarks, curated from real-world, domain-specific datasets, ensuring that tasks reflect authentic scientific challenges. The toolkit features a flexible, extensible evaluation pipeline that enables batch evaluation across models and datasets, supports custom model and dataset integration, and provides transparent, reproducible, and comparable results. By bridging capability-based evaluation and disciplinary diversity, SciEvalKit offers a standardized yet customizable infrastructure to benchmark the next generation of scientific foundation models and intelligent agents. The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.

preprint2025arXiv

Solving the inverse Source Problems for wave equation with final time measurements by a data driven approach

This paper develops a discrete data-driven approach for solving the inverse source problem of the wave equation with final time measurements. Focusing on the $L^2$-Tikhonov regularization method, we analyze its convergence under two different noise models, using noisy discrete spatial observations. By exploiting the spectral decomposition of the forward operator and introducing a noise separation technique into the variational framework, we establish error bounds for the reconstructed solution $u$ and the source term $f$ without requiring classical source conditions. Moreover, an expected convergence rate for the source error is derived in a weaker topology. We also extend the analysis to the fully discrete case with finite element discretization, showing that the overall error depends only on the noise level, regularization parameter, time step size, and spatial mesh size. These estimates provide a basis for selecting the optimal regularization parameter in a data-driven manner, without a priori information. Numerical experiments validate the theoretical results and demonstrate the efficiency of the proposed algorithm.

preprint2023arXiv

Beam Delivery and Beamstrahlung Considerations for Ultra-High Energy Linear Colliders

As part of the Snowmass'21 community planning excercise, the Advanced Accelerator Concepts (AAC) community proposed future linear colliders with center-of-mass energies up to 15 TeV and luminosities up to 50$\times10^{34}$ cm$^{-2}$s$^{-1}$ in a compact footprint. In addition to being compact, these machines must also be energy efficient. We identify two challenges that must be addressed in the design of these machines. First, the Beam Delivery System (BDS) must not add significant length to the accelerator complex. Second, beam parameters must be chosen to mitigate beamstrahlung effects and maximize the luminosity-per-power of the machine. In this paper, we review advances in plasma lens technology that will help to reduce the length of the BDS system and we detail new Particle-in-Cell simulation studies that will provide insight into beamstrahlung mitigation techniques. We apply our analysis to both $e^+e^-$ and $γγ$ colliders.

preprint2022arXiv

A Closer Look at Blind Super-Resolution: Degradation Models, Baselines, and Performance Upper Bounds

Degradation models play an important role in Blind super-resolution (SR). The classical degradation model, which mainly involves blur degradation, is too simple to simulate real-world scenarios. The recently proposed practical degradation model includes a full spectrum of degradation types, but only considers complex cases that use all degradation types in the degradation process, while ignoring many important corner cases that are common in the real world. To address this problem, we propose a unified gated degradation model to generate a broad set of degradation cases using a random gate controller. Based on the gated degradation model, we propose simple baseline networks that can effectively handle non-blind, classical, practical degradation cases as well as many other corner cases. To fairly evaluate the performance of our baseline networks against state-of-the-art methods and understand their limits, we introduce the performance upper bound of an SR network for every degradation type. Our empirical analysis shows that with the unified gated degradation model, the proposed baselines can achieve much better performance than existing methods in quantitative and qualitative results, which are close to the performance upper bounds.

preprint2022arXiv

Event Detection Explorer: An Interactive Tool for Event Detection Exploration

Event Detection (ED) is an important task in natural language processing. In the past few years, many datasets have been introduced for advancing ED machine learning models. However, most of these datasets are under-explored because not many tools are available for people to study events, trigger words, and event mention instances systematically and efficiently. In this paper, we present an interactive and easy-to-use tool, namely ED Explorer, for ED dataset and model exploration. ED Explorer consists of an interactive web application, an API, and an NLP toolkit, which can help both domain experts and non-experts to better understand the ED task. We use ED Explorer to analyze a recent proposed large-scale ED datasets (referred to as MAVEN), and discover several underlying problems, including sparsity, label bias, label imbalance, and debatable annotations, which provide us with directions to improve the MAVEN dataset. The ED Explorer can be publicly accessed through http://edx.leafnlp.org/. The demonstration video is available here https://www.youtube.com/watch?v=6QPnxPwxg50.

preprint2022arXiv

Probabilistic Consensus on Feature Distribution for Multi-robot Systems with Markovian Exploration Dynamics

In this paper, we present a consensus-based decentralized multi-robot approach to reconstruct a discrete distribution of features, modeled as an occupancy grid map, that represent information contained in a bounded planar 2D environment, such as visual cues used for navigation or semantic labels associated with object detection. The robots explore the environment according to a random walk modeled by a discrete-time discrete-state (DTDS) Markov chain and estimate the feature distribution from their own measurements and the estimates communicated by neighboring robots, using a distributed Chernoff fusion protocol. We prove that under this decentralized fusion protocol, each robot's feature distribution converges to the ground truth distribution in an almost sure sense. We verify this result in numerical simulations that show that the Hellinger distance between the estimated and ground truth feature distributions converges to zero over time for each robot. We also validate our strategy through Software-In-The-Loop (SITL) simulations of quadrotors that search a bounded square grid for a set of visual features distributed on a discretized circle.

preprint2020arXiv

Convexification for an Inverse Parabolic Problem

A convexification-based numerical method for a Coefficient Inverse Problem for a parabolic PDE is presented. The key element of this method is the presence of the so-called Carleman Weight Function in the numerical scheme. Convergence analysis ensures the global convergence of this method, as opposed to the local convergence of the conventional least squares minimization techniques. Numerical results demonstrate a good performance.

preprint2020arXiv

Design and Control of SQUEEZE: A Spring-augmented QUadrotor for intEractions with the Environment to squeeZE-and-fly

This paper presents the design and control of a novel quadrotor with a variable geometry to physically interact with cluttered environments and fly through narrow gaps and passageways. This compliant quadrotor with passive morphing capabilities is designed using torsional springs at every arm hinge to allow for rotation driven by external forces. We derive the dynamic model of this variable geometry quadrotor (SQUEEZE), and develop an adaptive controller for trajectory tracking. The corresponding Lyapunov stability proof of attitude tracking is also presented. Further, an admittance controller is designed to account for changes in yaw due to physical interactions with the environment. Finally, the proposed design is validated in flight tests with two setups: a small gap and a passageway. The experimental results demonstrate the unique capability of the SQUEEZE in navigating through constrained narrow spaces.

preprint2020arXiv

Predictive Modeling of Periodic Behavior for Human-Robot Symbiotic Walking

We propose in this paper Periodic Interaction Primitives - a probabilistic framework that can be used to learn compact models of periodic behavior. Our approach extends existing formulations of Interaction Primitives to periodic movement regimes, i.e., walking. We show that this model is particularly well-suited for learning data-driven, customized models of human walking, which can then be used for generating predictions over future states or for inferring latent, biomechanical variables. We also demonstrate how the same framework can be used to learn controllers for a robotic prosthesis using an imitation learning approach. Results in experiments with human participants indicate that Periodic Interaction Primitives efficiently generate predictions and ankle angle control signals for a robotic prosthetic ankle, with MAE of 2.21 degrees in 0.0008s per inference. Performance degrades gracefully in the presence of noise or sensor fall outs. Compared to alternatives, this algorithm functions 20 times faster and performed 4.5 times more accurately on test subjects.