Source author record

Yuwei Fan

Yuwei Fan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math-ph math.MP math.NA Numerical Analysis Machine Learning Artificial Intelligence physics.comp-ph cond-mat.stat-mech physics.flu-dyn

Catalog footprint

What is connected

15works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AIS: Adaptive Importance Sampling for Quantized RL

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.

preprint2026arXiv

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference

Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI accelerators. On architectures with decoupled compute units (e.g., Ascend NPUs), dequantization operations can consume more cycles than the matrix multiplication itself, leaving the high-throughput tensor cores underutilized. This paper presents Multi-Scale Dequant (MSD), a quantization framework that removes weight/KV dequantization from the GEMM critical path. Instead of lifting low-bit weights to BF16 precision, MSD decomposes high-precision BF16 activations into multiple low-precision components, each of which can be multiplied directly with quantized weights via native hardware-accelerated GEMM. This approach shifts the computational paradigm from precision conversion to multi-scale approximation, avoiding INT8-to-BF16 weight conversion before GEMM. We instantiate MSD for two weight formats and derive tight error bounds for each. For INT8 weights (W4A16), two-pass INT8 decomposition achieves near 16 effective bits. For MXFP4 weights (W4A16), two-pass MXFP4 decomposition yields near 6.6 effective bits with error bound 1/64 per block surpassing single-pass MXFP8(5.24 bits) while maintaining the same effective GEMM compute time. We further derive closed-form latency and HBM traffic models showing that MSD avoids the Vector-Cube pipeline stall caused by dequantization and reduces KV cache HBM traffic by up to 2.5 times in attention. Numerical simulations on matrix multiplication and Flash Attention kernels confirm that MSD does not degrade accuracy compared to dequantization baselines, and in many settings achieves lower L2 error.

preprint2021arXiv

A Simple Multiscale Method for Mean Field Games

This paper proposes a multiscale method for solving the numerical solution of mean field games which accelerates the convergence and addresses the problem of determining the initial guess. Starting from an approximate solution at the coarsest level, the method constructs approximations on successively finer grids via alternating sweeping, which not only allows for the use of classical time marching numerical schemes but also enables applications to both local and nonlocal problems. At each level, numerical relaxation is used to stabilize the iterative process. A second-order discretization scheme is derived for higher-order convergence. Numerical examples are provided to demonstrate the efficiency of the proposed method in both local and nonlocal, 1-dimensional and 2-dimensional cases.

preprint2021arXiv

Multi-Level Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data

In scientific machine learning, regression networks have been recently applied to approximate solution maps (e.g., potential-ground state map of Schrödinger equation). In this paper, we aim to reduce the generalization error without spending more time in generating training samples. However, to reduce the generalization error, the regression network needs to be fit on a large number of training samples (e.g., a collection of potential-ground state pairs). The training samples can be produced by running numerical solvers, which takes much time in many applications. In this paper, we aim to reduce the generalization error without spending more time in generating training samples. Inspired by few-shot learning techniques, we develop the Multi-Level Fine-Tuning algorithm by introducing levels of training: we first train the regression network on samples generated at the coarsest grid and then successively fine-tune the network on samples generated at finer grids. Within the same amount of time, numerical solvers generate more samples on coarse grids than on fine grids. We demonstrate a significant reduction of generalization error in numerical experiments on challenging problems with oscillations, discontinuities, or rough coefficients. Further analysis can be conducted in the Neural Tangent Kernel regime and we provide practical estimators to the generalization error. The number of training samples at different levels can be optimized for the smallest estimated generalization error under the constraint of budget for training data. The optimized distribution of budget over levels provides practical guidance with theoretical insight as in the celebrated Multi-Level Monte Carlo algorithm.

preprint2020arXiv

A Nonlinear Hyperbolic Model for Radiative Transfer Equation in Slab Geometry

Linear models for the radiative transfer equation have been well developed, while nonlinear models are seldom investigated even for slab geometry due to some essential difficulties. We have proposed a moment model in MPN for slab geometry which combines the ideas of the classical PN and MN model. Though the model is far from perfect, it was demonstrated to be quite efficient in numerically approximating the solution of the radiative transfer equation, that we are motivated to further improve this model. Consequently we propose in this paper a new model following the chartmap in MPN with some significant theoretic progresses. The new model is derived with global hyperbolicity, and meanwhile some necessary physical properties are preserved. We give a complete analysis to the characteristic structure and propose a numerical scheme for the new model. Numerical examples are presented to demonstrate the numerical performance of the new model.

preprint2020arXiv

Hyperbolic Model Reduction for Kinetic Equations

We make a brief historical review to the moment model reduction to the kinetic equations, particularly the Grad's moment method for Boltzmann equation. The focus is on the hyperbolicity of the reduced model, which is essential to the existence of its classical solution as a Cauchy problem. The theory of the framework we developed in last years is then introduced, which may preserve the hyperbolic nature of the kinetic equations with high universality. Some lastest progress on the comparison between models with/without hyperbolicity is presented to validate the hyperbolic moment models for rarefied gases.

preprint2020arXiv

Meta-learning Pseudo-differential Operators with Deep Neural Networks

This paper introduces a meta-learning approach for parameterized pseudo-differential operators with deep neural networks. With the help of the nonstandard wavelet form, the pseudo-differential operators can be approximated in a compressed form with a collection of vectors. The nonlinear map from the parameter to this collection of vectors and the wavelet transform are learned together from a small number of matrix-vector multiplications of the pseudo-differential operator. Numerical results for Green's functions of elliptic partial differential equations and the radiative transfer equations demonstrate the efficiency and accuracy of the proposed approach.

preprint2019arXiv

Solving Electrical Impedance Tomography with Deep Learning

This paper introduces a new approach for solving electrical impedance tomography (EIT) problems using deep neural networks. The mathematical problem of EIT is to invert the electrical conductivity from the Dirichlet-to-Neumann (DtN) map. Both the forward map from the electrical conductivity to the DtN map and the inverse map are high-dimensional and nonlinear. Motivated by the linear perturbative analysis of the forward map and based on a numerically low-rank property, we propose compact neural network architectures for the forward and inverse maps for both 2D and 3D problems. Numerical results demonstrate the efficiency of the proposed neural networks.

preprint2015arXiv

Model Reduction of Kinetic Equations by Operator Projection

By a further study of the mechanism of the hyperbolic regularization of the moment system for Boltzmann equation proposed in [Z. Cai, Y. Fan, R. Li, Comm. Math. Sci. 11(2): 547-571, 2013], we point out that the key point is treating the time and space derivative in the same way. Based on this understanding, a uniform framework to derive globally hyperbolic moment systems from kinetic equations using an operator projection method is proposed. The framework is so concise and clear that it can be treated as an algorithm with four inputs to derive hyperbolic moment system by routine calculations. Almost all existing globally hyperbolic moment system can be included in the framework, as well as some new moment system including globally hyperbolic regularized versions of Grad ordered moment system and a multidimensional extension of the quadrature-based moment system.

preprint2014arXiv

A Framework on Moment Model Reduction for Kinetic Equation

By a further investigation on the structure of the coefficient matrix of the globally hyperbolic regularized moment equations for Boltzmann equation in [Z. Cai, Y. Fan and R. Li, Comm. Math. Sci., 11 (2013), pp. 547-571], we propose a uniform framework to carry out model reduction to general kinetic equations, to achieve certain moment system. With this framework, the underlying reason why the globally hyperbolic regularization in [Z. Cai, Y. Fan and R. Li, Comm. Math. Sci., 11 (2013), pp. 547-571] works is revealed. The even fascinating point is, with only routine calculation, existing models are represented and brand new models are discovered. Even if the study is restricted in the scope of the classical Grad's 13-moment system, new model with global hyperbolicity can be deduced.

preprint2014arXiv

Globally Hyperbolic Moment System by Generalized Hermite Expansion

In a recent paper [Z.-N. Cai, Y.-W. Fan, and R. Li. Tech Report, Institude of Math, Peking Univeristy(2013)], it was revealed that a modified 13-moment system taking intrinsic heat fluxes as variables, instead of the heat fluxes along the coordinate vectors which is adopted in the classical Grad 13-moment system, attains some additional advantages than the classical Grad 13-moment system, particularly including that the equilibrium is turned to be the interior point of its hyperbolicity region. The modified 13-moment system was actually derived from the generalized Hermite expansion of the distribution function, where the anisotropy of Hermite expansion is specified by the full temperature tensor. We extend the method therein in this paper to high order of generalized Hermite expansion to derive arbitrary order moment systems, and proposed a globally hyperbolic regularization to achieve locally well-posedness similar to the method in [Z. Cai, Y. Fan, and R. Li, Comm. Pure Appl. Math.(online)(2013)]. Furthermore, the structure of the eigen-system of the coefficient matrix and all characteristic waves are fully clarified. The obtained systems provide a systematic class of hydrodynamic models as the refined version of Euler equations, which is gradually approaching the Boltzmann equation with increasing order of the expansion.

preprint2014arXiv

On Hyperbolicity of 13-Moment System

We point out that the thermodynamic equilibrium is not an interior point of the hyperbolicity region of Grad's 13-moment system. With a compact expansion of the phase density, which is compacter than Grad's expansion, we derived a modified 13-moment system. The new 13-moment system admits the thermodynamic equilibrium as an interior point of its hyperbolicity region. We deduce a concise criterion to ensure the hyperbolicity, thus the hyperbolicity region can be quantitatively depicted.

preprint2012arXiv

Globally Hyperbolic Regularization of Grad's Moment System

In this paper, we propose a globally hyperbolic regularization to the general Grad's moment system in multi-dimensional spaces. Systems with moments up to an arbitrary order are studied. The characteristic speeds of the regularized moment system can be analytically given and only depend on the macroscopic velocity and the temperature. The structure of the eigenvalues and eigenvectors of the coefficient matrix is fully clarified. The regularization together with the properties of the resulting moment systems is consistent with the simple one-dimensional case discussed in [1]. Besides, all characteristic waves are proven to be genuinely nonlinear or linearly degenerate, and the studies on the properties of rarefaction waves, contact discontinuities and shock waves are included.

preprint2012arXiv

Globally Hyperbolic Regularization of Grad's Moment System in One Dimensional Space

In this paper, we present a regularization to 1D Grad's moment system to achieve global hyperbolicity. The regularization is based on the observation that the characteristic polynomial of the Jacobian of the flux in Grad's moment system is independent of the intermediate moments. The method is not relied on the form of the collision at all, thus this regularization is applicable to the system without collision terms. Moreover, the proposed approach is proved to be the unique one if only the last moment equation is allowed to be alternated to match the condition of non-equilibrium independent characteristic speeds. The hyperbolic structure of the regularized system, including the signal speeds, Riemann invariants and the properties of the characteristic waves including the rarefaction wave, contact discontinuity and shock are provided in the perfect formations.

preprint2012arXiv

Quantum Hydrodynamic Model by Moment Closure of Wigner Equation

In this paper, we derive the quantum hydrodynamics models based on the moment closure of the Wigner equation. The moment expansion adopted is of the Grad type firstly proposed in \cite{Grad}. The Grad's moment method was originally developed for the Boltzmann equation. In \cite{Fan_new}, a regularization method for the Grad's moment system of the Boltzmann equation was proposed to achieve the globally hyperbolicity so that the local well-posedness of the moment system is attained. With the moment expansion of the Wigner function, the drift term in the Wigner equation has exactly the same moment representation as in the Boltzmann equation, thus the regularization in \cite{Fan_new} applies. The moment expansion of the nonlocal Wigner potential term in the Wigner equation is turned to be a linear source term, which can only induce very mild growth of the solution. As the result, the local well-posedness of the regularized moment system for the Wigner equation remains as for the Boltzmann equation.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint