Researcher profile

Haochen Li

Haochen Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at https://z.ai/blog/glm-4.6v. Code, models and more information are released at https://github.com/zai-org/GLM-V.

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2026arXiv

Searth Transformer: A Transformer Architecture Incorporating Earth's Geospheric Physical Priors for Global Mid-Range Weather Forecasting

Accurate global medium-range weather forecasting is fundamental to Earth system science. Most existing Transformer-based forecasting models adopt vision-centric architectures that neglect the Earth's spherical geometry and zonal periodicity. In addition, conventional autoregressive training is computationally expensive and limits forecast horizons due to error accumulation. To address these challenges, we propose the Shifted Earth Transformer (Searth Transformer), a physics-informed architecture that incorporates zonal periodicity and meridional boundaries into window-based self-attention for physically consistent global information exchange. We further introduce a Relay Autoregressive (RAR) fine-tuning strategy that enables learning long-range atmospheric evolution under constrained memory and computational budgets. Based on these methods, we develop YanTian, a global medium-range weather forecasting model. YanTian achieves higher accuracy than the high-resolution forecast of the European Centre for Medium-Range Weather Forecasts and performs competitively with state-of-the-art AI models at one-degree resolution, while requiring roughly 200 times lower computational cost than standard autoregressive fine-tuning. Furthermore, YanTian attains a longer skillful forecast lead time for Z500 (10.3 days) than HRES (9 days). Beyond weather forecasting, this work establishes a robust algorithmic foundation for predictive modeling of complex global-scale geophysical circulation systems, offering new pathways for Earth system science.

preprint2025arXiv

Pinching Antenna Systems for Integrated Sensing and Communications

Recently, the pinching antenna system (PASS) has attracted considerable attention due to their advantages in flexible deployment and reduction of signal propagation loss. In this work, a multiple waveguide PASS assisted integrated sensing and communication (ISAC) system is proposed, where the base station (BS) is equipped with transmitting pinching antennas (PAs) and receiving uniform linear array (ULA) antennas. The full-duplex (FD) BS transmits the communication and sensing signals through the PAs on waveguides and collects the echo sensing signals with the mounted ULA. Based on this configuration, a target sensing Cramer Rao Bound (CRB) minimization problem is formulated under communication quality-of-service (QoS) constraints, power budget constraint, and PA deployment constraints. The alternating optimization (AO) method is employed to address the formulated non-convex optimization problem. In each iteration, the overall optimization problem is decomposed into a digital beamforming sub-problem and a pinching beamforming sub-problem. The sensing covariance matrix and communication beamforming matrix at the BS are optimized by solving the digital beamforming sub-problem with semidefinite relaxation (SDR). The PA deployment is updated by solving the pinching beamforming sub-problem with the successive convex approximation (SCA) method, penalty method, and element-wise optimization. Simulation results show that the proposed PASS assisted ISAC framework achieves superior performance over benchmark schemes, is less affected by stringent communication constraints compared to conventional MIMO-ISAC, and benefits further from increasing the number of waveguides and PAs per waveguide.

preprint2023arXiv

Turbulence Model Development based on a Novel Method Combining Gene Expression Programming with an Artificial Neural Network

Data-driven methods are widely used to develop physical models, but there still exist limitations that affect their performance, generalizability and robustness. By combining gene expression programming (GEP) with artificial neural network (ANN), we propose a novel method for symbolic regression called the gene expression programming neural network (GEPNN). In this method, candidate expressions generated by evolutionary algorithms are transformed between the GEP and ANN structures during training iterations, and efficient and robust convergence to accurate models is achieved by combining the GEP's global searching and the ANN's gradient optimization capabilities. In addition, sparsity-enhancing strategies have been introduced to GEPNN to improve the interpretability of the trained models. The GEPNN method has been tested for finding different physical laws, showing improved convergence to models with precise coefficients. Furthermore, for large-eddy simulation of turbulence, the subgrid-scale stress model trained by GEPNN significantly improves the prediction of turbulence statistics and flow structures over traditional models, showing advantages compared to the existing GEP and ANN methods in both a priori and a posteriori tests.