Researcher profile

Lingchao Zheng

Lingchao Zheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

AIS: Adaptive Importance Sampling for Quantized RL

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.

preprint2026arXiv

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference

Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI accelerators. On architectures with decoupled compute units (e.g., Ascend NPUs), dequantization operations can consume more cycles than the matrix multiplication itself, leaving the high-throughput tensor cores underutilized. This paper presents Multi-Scale Dequant (MSD), a quantization framework that removes weight/KV dequantization from the GEMM critical path. Instead of lifting low-bit weights to BF16 precision, MSD decomposes high-precision BF16 activations into multiple low-precision components, each of which can be multiplied directly with quantized weights via native hardware-accelerated GEMM. This approach shifts the computational paradigm from precision conversion to multi-scale approximation, avoiding INT8-to-BF16 weight conversion before GEMM. We instantiate MSD for two weight formats and derive tight error bounds for each. For INT8 weights (W4A16), two-pass INT8 decomposition achieves near 16 effective bits. For MXFP4 weights (W4A16), two-pass MXFP4 decomposition yields near 6.6 effective bits with error bound 1/64 per block surpassing single-pass MXFP8(5.24 bits) while maintaining the same effective GEMM compute time. We further derive closed-form latency and HBM traffic models showing that MSD avoids the Vector-Cube pipeline stall caused by dequantization and reduces KV cache HBM traffic by up to 2.5 times in attention. Numerical simulations on matrix multiplication and Flash Attention kernels confirm that MSD does not degrade accuracy compared to dequantization baselines, and in many settings achieves lower L2 error.

preprint2021arXiv

Direct Flux Gradient Approximation to Close Moment Model for Kinetic Equations

To close the moment model deduced from kinetic equations, the canonical approach is to provide an approximation to the flux function not able to be depicted by the moments in the reduced model. In this paper, we propose a brand new closure approach with remarkable advantages than the canonical approach. Instead of approximating the flux function, the new approach close the moment model by approximating the flux gradient. Precisely, we approximate the space derivative of the distribution function by an ansatz which is a weighted polynomial, and the derivative of the closing flux is computed by taking the moments of the ansatz. Consequently, the method provides us an improved framework to derive globally hyperbolic moment models, which preserve all those conservative variables in the low order moments. It is shown that the linearized system at the weight function, which is often the local equilibrium, of the moment model deduced by our new approach is automatically coincided with the system deduced from the classical perturbation theory, which can not be satisfied by previous hyperbolic regularization framework. Taking the Boltzmann equation as example, the linearlization of the moment model gives the correct Navier-Stokes-Fourier law same as that the Chapman-Enskog expansion gives. Most existing globally hyperbolic moment models are re-produced by our new approach, and several new models are proposed based on this framework.

preprint2020arXiv

A Nonlinear Hyperbolic Model for Radiative Transfer Equation in Slab Geometry

Linear models for the radiative transfer equation have been well developed, while nonlinear models are seldom investigated even for slab geometry due to some essential difficulties. We have proposed a moment model in MPN for slab geometry which combines the ideas of the classical PN and MN model. Though the model is far from perfect, it was demonstrated to be quite efficient in numerically approximating the solution of the radiative transfer equation, that we are motivated to further improve this model. Consequently we propose in this paper a new model following the chartmap in MPN with some significant theoretic progresses. The new model is derived with global hyperbolicity, and meanwhile some necessary physical properties are preserved. We give a complete analysis to the characteristic structure and propose a numerical scheme for the new model. Numerical examples are presented to demonstrate the numerical performance of the new model.

preprint2020arXiv

A Nonlinear Moment Model for Radiative Transfer Equation

We derive a nonlinear moment model for radiative transfer equation in 3D space, using the method to derive the nonlinear moment model for the radiative transfer equation in slab geometry. The resulted 3D HMPN model enjoys a list of mathematical advantages, including global hyperbolicity, rotational invariance, physical wave speeds, spectral accuracy, and correct higher-order Eddington approximation. Simulation examples are presented to validate the new model numerically.

preprint2019arXiv

A Nonlinear Three-Moment Model for Radiative Transfer in Spherical Symmetry

We study the approximation of the radiative transfer equation with a relatively few moments in the spherically symmetric case. We propose a three-moment model based on choosing the beta distribution as the ansatz for the specific intensity. This ansatz enables our model to capture the anisotropy in the distribution function. The characteristic structure of the Riemann problem of the model is studied in detail. Numerical simulations demonstrate its validity in approximating the radiative transfer equation in the spherically symmetric case and its advantage in approximating highly anisotropic distribution functions in comparison to the $P_n$ method.