Researcher profile

Juntao Huang

Juntao Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs

Quantization is a key method for reducing the GPU memory requirement of training large language models (LLMs). Yet, current approaches are ineffective for 4-bit activations and 8-bit gradients, which would easily cause slow convergence or accuracy loss. To address this, we introduce AGoQ, incorporating two new techniques: 1) a layer-aware activation quantization algorithm that allocates appropriate bit-widths for activations of various layers based on their types and pipeline stages to achieve near 4-bit activation storage, and 2) a gradient quantization algorithm that reduces memory usage and shortens communication time by employing 8-bit gradient storage and precision-preserving 8-bit All-Reduce communication. We conduct extensive experiments using different sizes of LLMs on two GPU clusters (up to 64 GPUs), and the experimental results show that our AGoQ reduces the memory by up to 52\% and achieves up to 1.34$\times$ improvement of training speed compared to state-of-the-art training systems Megatron-LM (w/ or w/o ZeRO), COAT and DeepSpeed with 8B to 32B LLaMA models, while achieving convergence loss on pretraining and comparable accuracy on downstream tasks with LLaMA architectures.

preprint2022arXiv

Coupling conditions for linear hyperbolic relaxation systems in two-scales problems

This work is concerned with coupling conditions for linear hyperbolic relaxation systems with multiple relaxation times. In the region with small relaxation time, an equilibrium system can be used for computational efficiency. Under the assumption that the relaxation system satisfies the structural stability condition and the interface is non-characteristic, we derive a coupling condition at the interface to couple the two systems in a domain decomposition setting. We prove the validity by the energy estimate and Laplace transform, which shows how the error of the domain decomposition method depends on the smaller relaxation time and the boundary layer effects. In addition, we propose a discontinuous Galerkin (DG) scheme for solving the interface problem with the derived coupling condition and prove the L2 stability. We validate our analysis on the linearized Carleman model and the linearized Grad's moment system and show the effectiveness of the DG scheme.

preprint2022arXiv

Machine learning moment closure models for the radiative transfer equation I: directly learning a gradient based closure

In this paper, we take a data-driven approach and apply machine learning to the moment closure problem for radiative transfer equation in slab geometry. Instead of learning the unclosed high order moment, we propose to directly learn the gradient of the high order moment using neural networks. This new approach is consistent with the exact closure we derive for the free streaming limit and also provides a natural output normalization. A variety of benchmark tests, including the variable scattering problem, the Gaussian source problem with both periodic and reflecting boundaries, and the two-material problem, show both good accuracy and generalizability of our machine learning closure model.

preprint2022arXiv

On the stability of strong-stability-preserving modified Patankar Runge-Kutta schemes

In this paper, we perform stability analysis for a class of second and third order accurate strong-stability-preserving modified Patankar Runge-Kutta (SSPMPRK) schemes, which were introduced in [4,5] and can be used to solve convection equations with stiff source terms, such as reactive Euler equations, with guaranteed positivity under the standard CFL condition due to the convection terms only. The analysis allows us to identify the range of free parameters in these SSPMPRK schemes in order to ensure stability. Numerical experiments are provided to demonstrate the validity of the analysis.

preprint2020arXiv

An adaptive multiresolution discontinuous Galerkin method with artificial viscosity for scalar hyperbolic conservation laws in multidimensions

In this paper, we develop an adaptive multiresolution discontinuous Galerkin (DG) scheme for scalar hyperbolic conservation laws in multidimensions. Compared with previous work for linear hyperbolic equations \cite{guo2016transport, guo2017adaptive}, a class of interpolatory multiwavelets are applied to efficiently compute the nonlinear integrals over elements and edges in DG schemes. The resulting algorithm, therefore can achieve similar computational complexity as the sparse grid DG method for smooth solutions. Theoretical and numerical studies are performed taking into consideration of accuracy and stability with regard to the choice of the interpolatory multiwavelets. Artificial viscosity is added to capture the shock and only acts on the leaf elements taking advantages of the multiresolution representation. Adaptivity is realized by auto error thresholding based on hierarchical surplus. Accuracy and robustness are demonstrated by several numerical tests.

preprint2020arXiv

An adaptive multiresolution interior penalty discontinuous Galerkin method for wave equations in second order form

In this paper, we propose a class of adaptive multiresolution (also called adaptive sparse grid) discontinuous Galerkin (DG) methods for simulating scalar wave equations in second order form in space. The two key ingredients of the schemes include an interior penalty DG formulation in the adaptive function space and two classes of multiwavelets for achieving multiresolution. In particular, the orthonormal Alpert's multiwavelets are used to express the DG solution in terms of a hierarchical structure, and the interpolatory multiwavelets are further introduced to enhance computational efficiency in the presence of variable wave speed or nonlinear source. Some theoretical results on stability and accuracy of the proposed method are presented. Benchmark numerical tests in 2D and 3D are provided to validate the performance of the method.

preprint2020arXiv

An adaptive multiresolution ultra-weak discontinuous Galerkin method for nonlinear Schrodinger equations

This paper develops a high order adaptive scheme for solving nonlinear Schrodinger equations. The solutions to such equations often exhibit solitary wave and local structures, which makes adaptivity essential in improving the simulation efficiency. Our scheme uses the ultra-weak discontinuous Galerkin (DG) formulation and belongs to the framework of adaptive multiresolution schemes. Various numerical experiments are presented to demonstrate the excellent capability of capturing the soliton waves and the blow-up phenomenon.

preprint2020arXiv

Boundary treatment of high order Runge-Kutta methods for hyperbolic conservation laws

In \cite{ZH2019}, we developed a boundary treatment method for implicit-explicit (IMEX) Runge-Kutta (RK) methods for solving hyperbolic systems with source terms. Since IMEX RK methods include explicit ones as special cases, this boundary treatment method naturally applies to explicit methods as well. In this paper, we examine this boundary treatment method for the case of explicit RK schemes of arbitrary order applied to hyperbolic conservation laws. We show that the method not only preserves the accuracy of explicit RK schemes but also possesses good stability. This compares favourably to the inverse Lax-Wendroff method in \cite{TS2010,TWSN2012} where analysis and numerical experiments have previously verified the presence of order reduction \cite{TS2010,TWSN2012}. In addition, we demonstrate that our method performs well for strong-stability-preserving (SSP) RK schemes involving negative coefficients and downwind spatial discretizations. It is numerically shown that when boundary conditions are present and the proposed boundary treatment is used, that SSP RK schemes with negative coefficients still allow for larger time steps than schemes with all non-negative coefficients. In this regard, our boundary treatment method is an effective supplement to SSP RK schemes with/without negative coefficients for initial-boundary value problems for hyperbolic conservation laws.