Source author record

Lexing Ying

Lexing Ying appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Numerical Analysis physics.comp-ph Machine Learning math.OC math.ST Statistics Theory physics.chem-ph Artificial Intelligence Computer Science and Game Theory cond-mat.mtrl-sci cond-mat.str-el Distributed, Parallel, and Cluster Computing math.PR Methodology

Catalog footprint

What is connected

57works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Multimodal Sampling via Approximate Symmetries

Sampling from multimodal distributions is a challenging task in scientific computing. When a distribution has an exact symmetry between the modes, direct jumps among them can accelerate the samplings significantly. However, the distributions from most applications do not have exact symmetries. This paper considers the distributions with approximate symmetries. We first construct an exactly symmetric reference distribution from the target one by averaging over the group orbit associated with the approximate symmetry. Next, we can apply the multilevel Monte Carlo methods by constructing a continuation path between the reference and target distributions. We discuss how to implement these steps with annealed importance sampling and tempered transitions. Compared with traditional multilevel methods, the proposed approach can be more effective since the reference and target distributions are much closer. Numerical results of the Ising models are presented to illustrate the efficiency of the proposed method.

preprint2023arXiv

Variational Actor-Critic Algorithms

We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we propose two variants, the clipping method and the flipping method, in order to speed up the convergence. We also prove that, when the prefactor of the Bellman residual is sufficiently large, the fixed point of the algorithm is close to the optimal policy.

preprint2022arXiv

A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks

In this paper, we propose a semigroup method for solving high-dimensional elliptic partial differential equations (PDEs) and the associated eigenvalue problems based on neural networks. For the PDE problems, we reformulate the original equations as variational problems with the help of semigroup operators and then solve the variational problems with neural network (NN) parameterization. The main advantages are that no mixed second-order derivative computation is needed during the stochastic gradient descent training and that the boundary conditions are taken into account automatically by the semigroup operator. Unlike popular methods like PINN \cite{raissi2019physics} and Deep Ritz \cite{weinan2018deep} where the Dirichlet boundary condition is enforced solely through penalty functions and thus changes the true solution, the proposed method is able to address the boundary conditions without penalty functions and it gives the correct true solution even when penalty functions are added, thanks to the semigroup operator. For eigenvalue problems, a primal-dual method is proposed, efficiently resolving the constraint with a simple scalar dual variable and resulting in a faster algorithm compared with the BSDE solver \cite{han2020solving} in certain problems such as the eigenvalue problem associated with the linear Schrödinger operator. Numerical results are provided to demonstrate the performance of the proposed methods.

preprint2022arXiv

Analytic continuation from limited noisy Matsubara data

This note proposes a new algorithm for estimating spectral function from limited noisy Matsubara data. We consider both the molecule and condensed matter cases. In each case, the algorithm constructs an interpolant of the Matsubara data and uses conformal mapping and Prony's method to estimate the spectral function. Numerical results are provided to demonstrate the performance of the algorithm.

preprint2022arXiv

Annealed importance sampling for Ising models with mixed boundary conditions

This note introduces a method for sampling Ising models with mixed boundary conditions. As an application of annealed importance sampling and the Swendsen-Wang algorithm, the method adopts a sequence of intermediate distributions that keeps the temperature fixed but turns on the boundary condition gradually. The numerical results show that the variance of the sample weights is relatively small.

preprint2022arXiv

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.

preprint2022arXiv

Correcting Convexity Bias in Function and Functional Estimate

A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce the expected estimate error under mild conditions. When applied, the methods require no specific knowledge about the problem except the convexity and the evaluation of the function. Therefore, they can serve as off-the-shelf tools to obtain good estimate for a wide range of problems, including optimization problems with random objective functions or constraints, and functionals of probability distributions such as the entropy and the Wasserstein distance. Numerical experiments on a wide variety of problems show that our methods can significantly improve the quality of the estimate compared with the baseline method.

preprint2022arXiv

Double Flip Move for Ising Models with Mixed Boundary Conditions

This note introduces the double flip move for accelerating the Swendsen-Wang algorithm for Ising models with mixed boundary conditions below the critical temperature. The double flip move consists of a geometric flip of the spin lattice followed by a spin value flip. Both the symmetric and approximately symmetric models are considered. We prove the detailed balance of the double flip move and demonstrate its empirical efficiency in mixing.

preprint2022arXiv

Enterprise-Scale Search: Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Tree-based models underpin many modern semantic search engines and recommender systems due to their sub-linear inference times. In industrial applications, these models operate at extreme scales, where every bit of performance is critical. Memory constraints at extreme scales also require that models be sparse, hence tree-based models are often back-ended by sparse matrix algebra routines. However, there are currently no sparse matrix techniques specifically designed for the sparsity structure one encounters in tree-based models for extreme multi-label ranking/classification (XMR/XMC) problems. To address this issue, we present the masked sparse chunk multiplication (MSCM) technique, a sparse matrix technique specifically tailored to XMR trees. MSCM is easy to implement, embarrassingly parallelizable, and offers a significant performance boost to any existing tree inference pipeline at no cost. We perform a comprehensive study of MSCM applied to several different sparse inference schemes and benchmark our methods on a general purpose extreme multi-label ranking framework. We observe that MSCM gives consistently dramatic speedups across both the online and batch inference settings, single- and multi-threaded settings, and on many different tree models and datasets. To demonstrate its utility in industrial applications, we apply MSCM to an enterprise-scale semantic product search problem with 100 million products and achieve sub-millisecond latency of 0.88 ms per query on a single thread -- an 8x reduction in latency over vanilla inference techniques. The MSCM technique requires absolutely no sacrifices to model accuracy as it gives exactly the same results as standard sparse matrix techniques. Therefore, we believe that MSCM will enable users of XMR trees to save a substantial amount of compute resources in their inference pipelines at very little cost.

preprint2022arXiv

Operator Shifting for General Noisy Matrix Systems

In the computational sciences, one must often estimate model parameters from data subject to noise and uncertainty, leading to inaccurate results. In order to improve the accuracy of models with noisy parameters, we consider the problem of reducing error in a linear system with the operator corrupted by noise. Our contribution in this paper is to extend the elliptic operator shifting framework from Etter, Ying '20 to the general nonsymmetric matrix case. Roughly, the operator shifting technique is a matrix analogue of the James-Stein estimator. The key insight is that a shift of the matrix inverse estimate in an appropriately chosen direction will reduce average error. In our extension, we interrogate a number of questions -- namely, whether or not shifting towards the origin for general matrix inverses always reduces error as it does in the elliptic case. We show that this is usually the case, but that there are three key features of the general nonsingular matrices that allow for adversarial examples not possible in the symmetric case. We prove that when these adversarial possibilities are eliminated by the assumption of noise symmetry and the use of the residual norm as the error metric, the optimal shift is always towards the origin, mirroring results from Etter, Ying '20. We also investigate behavior in the small noise regime and other scenarios. We conclude by presenting numerical experiments (with accompanying source code) inspired by Reinforcement Learning to demonstrate that operator shifting can yield substantial reductions in error.

preprint2022arXiv

Operator Shifting for Noisy Elliptic Systems

In the computational sciences, one must often estimate model parameters from data subject to noise and uncertainty, leading to inaccurate results. In order to improve the accuracy of models with noisy parameters, we consider the problem of reducing error in an elliptic linear system with the operator corrupted by noise. We assume the noise preserves positive definiteness, but otherwise, we make no additional assumptions the structure of the noise. Under these assumptions, we propose the operator shifting framework, a collection of easy-to-implement algorithms that augment a noisy inverse operator by subtracting an additional auxiliary term. In a similar fashion to the James-Stein estimator, this has the effect of drawing the noisy inverse operator closer to the ground truth, and hence reduces error by reducing both bias and variance. We develop bootstrap Monte Carlo algorithms to estimate the required augmentation magnitude for optimal error reduction in the noisy system. To improve the tractability of these algorithms, we propose several approximate polynomial expansions for the operator inverse, and prove desirable convergence and monotonicity properties for these expansions. We also prove theorems that quantify the error reduction obtained by operator augmentation. In addition to theoretical results, we provide a set of numerical experiments on four different graph and grid Laplacian systems that all demonstrate effectiveness of our method.

preprint2022arXiv

Pole recovery from noisy data on imaginary axis

This note proposes an algorithm for identifying the poles and residues of a meromorphic function from its noisy values on the imaginary axis. The algorithm uses Möbius transform and Prony's method in the frequency domain. Numerical results are provided to demonstrate the performance of the algorithm.

preprint2022arXiv

Provably convergent quasistatic dynamics for mean-field two-player zero-sum games

In this paper, we study the problem of finding mixed Nash equilibrium for mean-field two-player zero-sum games. Solving this problem requires optimizing over two probability distributions. We consider a quasistatic Wasserstein gradient flow dynamics in which one probability distribution follows the Wasserstein gradient flow, while the other one is always at the equilibrium. Theoretical analysis are conducted on this dynamics, showing its convergence to the mixed Nash equilibrium under mild conditions. Inspired by the continuous dynamics of probability distributions, we derive a quasistatic Langevin gradient descent method with inner-outer iterations, and test the method on different problems, including training mixture of GANs.

preprint2021arXiv

A Simple Multiscale Method for Mean Field Games

This paper proposes a multiscale method for solving the numerical solution of mean field games which accelerates the convergence and addresses the problem of determining the initial guess. Starting from an approximate solution at the coarsest level, the method constructs approximations on successively finer grids via alternating sweeping, which not only allows for the use of classical time marching numerical schemes but also enables applications to both local and nonlocal problems. At each level, numerical relaxation is used to stabilize the iterative process. A second-order discretization scheme is derived for higher-order convergence. Numerical examples are provided to demonstrate the efficiency of the proposed method in both local and nonlocal, 1-dimensional and 2-dimensional cases.

preprint2021arXiv

How to Learn when Data Reacts to Your Model: Performative Gradient Descent

Performative distribution shift captures the setting where the choice of which ML model is deployed changes the data distribution. For example, a bank which uses the number of open credit lines to determine a customer's risk of default on a loan may induce customers to open more credit lines in order to improve their chances of being approved. Because of the interactions between the model and data distribution, finding the optimal model parameters is challenging. Works in this area have focused on finding stable points, which can be far from optimal. Here we introduce performative gradient descent (PerfGD), which is the first algorithm which provably converges to the performatively optimal point. PerfGD explicitly captures how changes in the model affects the data distribution and is simple to use. We support our findings with theory and experiments.

preprint2021arXiv

Multi-Level Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data

In scientific machine learning, regression networks have been recently applied to approximate solution maps (e.g., potential-ground state map of Schrödinger equation). In this paper, we aim to reduce the generalization error without spending more time in generating training samples. However, to reduce the generalization error, the regression network needs to be fit on a large number of training samples (e.g., a collection of potential-ground state pairs). The training samples can be produced by running numerical solvers, which takes much time in many applications. In this paper, we aim to reduce the generalization error without spending more time in generating training samples. Inspired by few-shot learning techniques, we develop the Multi-Level Fine-Tuning algorithm by introducing levels of training: we first train the regression network on samples generated at the coarsest grid and then successively fine-tune the network on samples generated at finer grids. Within the same amount of time, numerical solvers generate more samples on coarse grids than on fine grids. We demonstrate a significant reduction of generalization error in numerical experiments on challenging problems with oscillations, discontinuities, or rough coefficients. Further analysis can be conducted in the Neural Tangent Kernel regime and we provide practical estimators to the generalization error. The number of training samples at different levels can be optimized for the smallest estimated generalization error under the constraint of budget for training data. The optimized distribution of budget over levels provides practical guidance with theoretical insight as in the celebrated Multi-Level Monte Carlo algorithm.

preprint2021arXiv

Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy for selecting multiple arms. We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step. In the extreme setting, where the total number of arms can be in the millions, we propose a practically-motivated arm hierarchy model that induces a certain structure in mean rewards to ensure statistical and computational efficiency. The hierarchical structure allows for an exponential reduction in the number of relevant arms for each context, thus resulting in a regret guarantee of $O(k\sqrt{(\log A-k+1)T \log (|\mathcal{F}|T)})$. Finally, we implement our algorithm using a hierarchical linear function class and show superior performance with respect to well-known benchmarks on simulated bandit feedback experiments using extreme multi-label classification datasets. On a dataset with three million arms, our reduction scheme has an average inference time of only 7.9 milliseconds, which is a 100x improvement.

preprint2020arXiv

A heuristic independent particle approximation to determinantal point processes

A determinantal point process is a stochastic point process that is commonly used to capture negative correlations. It has become increasingly popular in machine learning in recent years. Sampling a determinantal point process however remains a computationally intensive task. This note introduces a heuristic independent particle approximation to determinantal point processes. The approximation is based on the physical intuition of fermions and is implemented using standard numerical linear algebra routines. Sampling from this independent particle approximation can be performed at a negligible cost. Numerical results are provided to demonstrate the performance of the proposed algorithm.

preprint2020arXiv

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically, we propose a new continuum limit of deep residual networks, which enjoys a good landscape in the sense that every local minimizer is global. This characterization enables us to derive the first global convergence result for multilayer neural networks in the mean-field regime. Furthermore, without assuming the convexity of the loss landscape, our proof relies on a zero-loss assumption at the global minimizer that can be achieved when the model shares a universal approximation property. Key to our result is the observation that a deep residual network resembles a shallow network ensemble, i.e. a two-layer network. We bound the difference between the shallow network and our ResNet model via the adjoint sensitivity method, which enables us to apply existing mean-field analyses of two-layer networks to deep networks. Furthermore, we propose several novel training schemes based on the new continuous model, including one training procedure that switches the order of the residual blocks and results in strong empirical performance on the benchmark datasets.

preprint2020arXiv

A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent

We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. Stochastic modified equations for the asynchronous stochastic gradient descent, arXiv:1805.08244]. We prove that when the number of local workers is larger than the expected staleness, then ASGD is more efficient than stochastic gradient descent. Our theoretical result also suggests that longer delays result in slower convergence rate. Besides, the learning rate cannot be smaller than a threshold inversely proportional to the expected staleness.

preprint2020arXiv

A simple solver for the fractional Laplacian in multiple dimensions

We present a simple discretization scheme for the hypersingular integral representation of the fractional Laplace operator and solver for the corresponding fractional Laplacian problem. Through singularity subtraction, we obtain a regularized integrand that is amenable to the trapezoidal rule with equispaced nodes, assuming a high degree of regularity in the underlying function (i.e., $u\in C^6(R^d)$). The resulting quadrature scheme gives a discrete operator on a regular grid that is translation-invariant and thus can be applied quickly with the fast Fourier transform. For discretizations of problems related to space-fractional diffusion on bounded domains, we observe that the underlying linear system can be efficiently solved via preconditioned Krylov methods with a preconditioner based on the finite-difference (non-fractional) Laplacian. We show numerical results illustrating the error of our simple scheme as well the efficiency of our preconditioning approach, both for the elliptic (steady-state) fractional diffusion problem and the time-dependent problem.

preprint2020arXiv

Borrowing From the Future: Addressing Double Sampling in Model-free Control

In model-free reinforcement learning, the temporal difference method and its variants become unstable when combined with nonlinear function approximations. Bellman residual minimization with stochastic gradient descent (SGD) is more stable, but it suffers from the double sampling problem: given the current state, two independent samples for the next state are required, but often only one sample is available. Recently, the authors of [Zhu et al, 2020] introduced the borrowing from the future (BFF) algorithm to address this issue for the prediction problem. The main idea is to borrow extra randomness from the future to approximately re-sample the next state when the underlying dynamics of the problem are sufficiently smooth. This paper extends the BFF algorithm to action-value function based model-free control. We prove that BFF is close to unbiased SGD when the underlying dynamics vary slowly with respect to actions. We confirm our theoretical findings with numerical simulations.

preprint2020arXiv

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $Ω(P^2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $O\Big(\frac{N \log N}{P} + α\log P + β\log^2 P \Big)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $α$ denotes the latency, and $β$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

preprint2020arXiv

Meta-learning Pseudo-differential Operators with Deep Neural Networks

This paper introduces a meta-learning approach for parameterized pseudo-differential operators with deep neural networks. With the help of the nonstandard wavelet form, the pseudo-differential operators can be approximated in a compressed form with a collection of vectors. The nonlinear map from the parameter to this collection of vectors and the wavelet transform are learned together from a small number of matrix-vector multiplications of the pseudo-differential operator. Numerical results for Green's functions of elliptic partial differential equations and the radiative transfer equations demonstrate the efficiency and accuracy of the proposed approach.

preprint2020arXiv

Mirror Descent Algorithms for Minimizing Interacting Free Energy

This note considers the problem of minimizing interacting free energy. Motivated by the mirror descent algorithm, for a given interacting free energy, we propose a descent dynamics with a novel metric that takes into consideration the reference measure and the interacting term. This metric naturally suggests a monotone reparameterization of the probability measure. By discretizing the reparameterized descent dynamics with the explicit Euler method, we arrive at a new mirror-descent-type algorithm for minimizing interacting free energy. Numerical results are included to demonstrate the efficiency of the proposed algorithms.

preprint2020arXiv

Natural Gradient for Combined Loss Using Wavelets

Natural gradients have been widely used in optimization of loss functionals over probability space, with important examples such as Fisher-Rao gradient descent for Kullback-Leibler divergence, Wasserstein gradient descent for transport-related functionals, and Mahalanobis gradient descent for quadratic loss functionals. This note considers the situation in which the loss is a convex linear combination of these examples. We propose a new natural gradient algorithm by utilizing compactly supported wavelets to diagonalize approximately the Hessian of the combined loss. Numerical results are included to demonstrate the efficiency of the proposed algorithm.

preprint2020arXiv

Semidefinite relaxation of multi-marginal optimal transport for strictly correlated electrons in second quantization

We consider the strictly correlated electron (SCE) limit of the fermionic quantum many-body problem in the second-quantized formalism. This limit gives rise to a multi-marginal optimal transport (MMOT) problem. Here the marginal state space for our MMOT problem is the binary set $\{0,1\}$, and the number of marginals is the number $L$ of sites in the model. The costs of storing and computing the exact solution of the MMOT problem both scale exponentially with respect to $L$. We propose an efficient convex relaxation which can be solved by semidefinite programming (SDP). In particular, the semidefinite constraint is only of size $2L\times 2L$. Moreover, the SDP-based method yields an approximation of the dual potential needed to the perform self-consistent field iteration in the so-called Kohn-Sham SCE framework, which, once converged, yields a lower bound for the total energy of the system. We demonstrate the effectiveness of our methods on spinless and spinful Hubbard-type models. Numerical results indicate that our relaxation methods yield tight lower bounds for the optimal cost, in the sense that the error due to the semidefinite relaxation is much smaller than the intrinsic modeling error of the Kohn-Sham SCE method. We also describe how our relaxation methods generalize to arbitrary MMOT problems with pairwise cost functions.

preprint2019arXiv

Solving Electrical Impedance Tomography with Deep Learning

This paper introduces a new approach for solving electrical impedance tomography (EIT) problems using deep neural networks. The mathematical problem of EIT is to invert the electrical conductivity from the Dirichlet-to-Neumann (DtN) map. Both the forward map from the electrical conductivity to the DtN map and the inverse map are high-dimensional and nonlinear. Motivated by the linear perturbative analysis of the forward map and based on a numerically low-rank property, we propose compact neural network architectures for the forward and inverse maps for both 2D and 3D problems. Numerical results demonstrate the efficiency of the proposed neural networks.

preprint2019arXiv

Stochastic modified equations for the asynchronous stochastic gradient descent

We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's precise prediction to the trajectories of ASGD with various forcing terms. As an application of the SME, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.

preprint2016arXiv

Adaptively compressed polarizability operator for accelerating large scale \textit{ab initio} phonon calculations

Phonon calculations based on first principle electronic structure theory, such as the Kohn-Sham density functional theory, have wide applications in physics, chemistry and material science. The computational cost of first principle phonon calculations typically scales steeply as $\mathcal{O}(N_e^4)$, where $N_e$ is the number of electrons in the system. In this work, we develop a new method to reduce the computational complexity of computing the full dynamical matrix, and hence the phonon spectrum, to $\mathcal{O}(N_e^3)$. The key concept for achieving this is to compress the polarizability operator adaptively with respect to the perturbation of the potential due to the change of the atomic configuration. Such adaptively compressed polarizability operator (ACP) allows accurate computation of the phonon spectrum. The reduction of complexity only weakly depends on the size of the band gap, and our method is applicable to insulators as well as semiconductors with small band gaps. We demonstrate the effectiveness of our method using one-dimensional and two-dimensional model problems.

preprint2016arXiv

Additive Sweeping Preconditioner for the Helmholtz Equation

We introduce a new additive sweeping preconditioner for the Helmholtz equation based on the perfect matched layer (PML). This method divides the domain of interest into thin layers and proposes a new transmission condition between the subdomains where the emphasis is on the boundary values of the intermediate waves. This approach can be viewed as an effective approximation of an additive decomposition of the solution operator. When combined with the standard GMRES solver, the iteration number is essentially independent of the frequency. Several numerical examples are tested to show the efficiency of this new approach.

preprint2016arXiv

SCDM-k: Localized orbitals for solids via selected columns of the density matrix

The recently developed selected columns of the density matrix (SCDM) method [J. Chem. Theory Comput. 11, 1463, 2015] is a simple, robust, efficient and highly parallelizable method for constructing localized orbitals from a set of delocalized Kohn-Sham orbitals for insulators and semiconductors with $Γ$ point sampling of the Brillouin zone. In this work we generalize the SCDM method to Kohn-Sham density functional theory calculations with k-point sampling of the Brillouin zone, which is needed for more general electronic structure calculations for solids. We demonstrate that our new method, called SCDM-k, is by construction gauge independent and is a natural way to describe localized orbitals. SCDM-k computes localized orbitals without the use of an optimization procedure, and thus does not suffer from the possibility of being trapped in a local minimum. Furthermore, the computational complexity of using SCDM-k to construct orthogonal and localized orbitals scales as O(N log N ) where N is the total number of k-points in the Brillouin zone. SCDM-k is therefore efficient even when a large number of k-points are used for Brillouin zone sampling. We demonstrate the numerical performance of SCDM-k using systems with model potentials in two and three dimensions.

preprint2016arXiv

Tensor Network Skeletonization

We introduce a new coarse-graining algorithm, tensor network skeletonization, for the numerical computation of tensor networks. This approach utilizes a structure-preserving skeletonization procedure to remove short-range correlations effectively at every scale. This approach is first presented in the setting of 2D statistical Ising model and is then extended to higher dimensional tensor networks and disordered systems. When applied to the Euclidean path integral formulation, this approach also gives rise to new efficient representations of the ground states for 1D and 2D quantum Ising models.

preprint2015arXiv

A Multiscale Butterfly Algorithm for Multidimensional Fourier Integral Operators

This paper presents an efficient multiscale butterfly algorithm for computing Fourier integral operators (FIOs) of the form $(\mathcal{L} f)(x) = \int_{R^d}a(x,ξ) e^{2πıΦ(x,ξ)}\hat{f}(ξ) dξ$, where $Φ(x,ξ)$ is a phase function, $a(x,ξ)$ is an amplitude function, and $f(x)$ is a given input. The frequency domain is hierarchically decomposed into a union of Cartesian coronas. The integral kernel $a(x,ξ) e^{2πıΦ(x,ξ)}$ in each corona satisfies a special low-rank property that enables the application of a butterfly algorithm on the Cartesian phase-space grid. This leads to an algorithm with quasi-linear operation complexity and linear memory complexity. Different from previous butterfly methods for the FIOs, this new approach is simple and reduces the computational cost by avoiding extra coordinate transformations. Numerical examples in two and three dimensions are provided to demonstrate the practical advantages of the new algorithm.

preprint2015arXiv

Butterfly Factorization

The paper introduces the butterfly factorization as a data-sparse approximation for the matrices that satisfy a complementary low-rank property. The factorization can be constructed efficiently if either fast algorithms for applying the matrix and its adjoint are available or the entries of the matrix can be sampled individually. For an $N \times N$ matrix, the resulting factorization is a product of $O(\log N)$ sparse matrices, each with $O(N)$ non-zero entries. Hence, it can be applied rapidly in $O(N\log N)$ operations. Numerical results are provided to demonstrate the effectiveness of the butterfly factorization and its construction algorithms.

preprint2015arXiv

Compression of the electron repulsion integral tensor in tensor hypercontraction format with cubic scaling cost

Electron repulsion integral tensor has ubiquitous applications in quantum chemistry calculations. In this work, we propose an algorithm which compresses the electron repulsion tensor into the tensor hypercontraction format with $\mathcal{O}(n N^2 \log N)$ computational cost, where $N$ is the number of orbital functions and $n$ is the number of spatial grid points that the discretization of each orbital function has. The algorithm is based on a novel strategy of density fitting using a selection of a subset of spatial grid points to approximate the pair products of orbital functions on the whole domain.

preprint2015arXiv

Crystal image analysis using $2D$ synchrosqueezed transforms

We propose efficient algorithms based on a band-limited version of 2D synchrosqueezed transforms to extract mesoscopic and microscopic information from atomic crystal images. The methods analyze atomic crystal images as an assemblage of non-overlapping segments of 2D general intrinsic mode type functions, which are superpositions of non-linear wave-like components. In particular, crystal defects are interpreted as the irregularity of local energy; crystal rotations are described as the angle deviation of local wave vectors from their references; the gradient of a crystal elastic deformation can be obtained by a linear system generated by local wave vectors. Several numerical examples of synthetic and real crystal images are provided to illustrate the efficiency, robustness, and reliability of our methods.

preprint2015arXiv

Fast algorithm for periodic density fitting for Bloch waves

We propose an efficient algorithm for density fitting of Bloch waves for Hamiltonian operators with periodic potential. The algorithm is based on column selection and random Fourier projection of the orbital functions. The computational cost of the algorithm scales as $\mathcal{O}\bigl(N_{\text{grid}} N^2 + N_{\text{grid}} NK \log (NK)\bigr)$, where $N_{\text{grid}}$ is number of spatial grid points, $K$ is the number of sampling $k$-points in first Brillouin zone, and $N$ is the number of bands under consideration. We validate the algorithm by numerical examples in both two and three dimensions.

preprint2015arXiv

Hierarchical interpolative factorization for elliptic operators: differential equations

This paper introduces the hierarchical interpolative factorization for elliptic partial differential equations (HIF-DE) in two (2D) and three dimensions (3D). This factorization takes the form of an approximate generalized LU/LDL decomposition that facilitates the efficient inversion of the discretized operator. HIF-DE is based on the multifrontal method but uses skeletonization on the separator fronts to sparsify the dense frontal matrices and thus reduce the cost. We conjecture that this strategy yields linear complexity in 2D and quasilinear complexity in 3D. Estimated linear complexity in 3D can be achieved by skeletonizing the compressed fronts themselves, which amounts geometrically to a recursive dimensional reduction scheme. Numerical experiments support our claims and further demonstrate the performance of our algorithm as a fast direct solver and preconditioner. MATLAB codes are freely available.

preprint2015arXiv

Hierarchical interpolative factorization for elliptic operators: integral equations

This paper introduces the hierarchical interpolative factorization for integral equations (HIF-IE) associated with elliptic problems in two and three dimensions. This factorization takes the form of an approximate generalized LU decomposition that permits the efficient application of the discretized operator and its inverse. HIF-IE is based on the recursive skeletonization algorithm but incorporates a novel combination of two key features: (1) a matrix factorization framework for sparsifying structured dense matrices and (2) a recursive dimensional reduction strategy to decrease the cost. Thus, higher-dimensional problems are effectively mapped to one dimension, and we conjecture that constructing, applying, and inverting the factorization all have linear or quasilinear complexity. Numerical experiments support this claim and further demonstrate the performance of our algorithm as a generalized fast multipole method, direct solver, and preconditioner. HIF-IE is compatible with geometric adaptivity and can handle both boundary and volume problems. MATLAB codes are freely available.

preprint2015arXiv

Recursive Sweeping Preconditioner for the 3D Helmholtz Equation

This paper introduces the recursive sweeping preconditioner for the numerical solution of the Helmholtz equation in 3D. This is based on the earlier work of the sweeping preconditioner with the moving perfectly matched layers (PMLs). The key idea is to apply the sweeping preconditioner recursively to the quasi-2D auxiliary problems introduced in the 3D sweeping preconditioner. Compared to the non-recursive 3D sweeping preconditioner, the setup cost of this new approach drops from $O(N^{4/3})$ to $O(N)$, the application cost per iteration drops from $O(N\log N)$ to $O(N)$, and the iteration count only increases mildly when combined with the standard GMRES solver. Several numerical examples are tested and the results are compared with the non-recursive sweeping preconditioner to demonstrate the efficiency of the new approach.

preprint2015arXiv

Sparsifying preconditioner for soliton calculations

We develop a robust and efficient method for soliton calculations for nonlinear Schrödinger equations. The method is based on the recently developed sparsifying preconditioner combined with Newton's iterative method. The performance of the method is demonstrated by numerical examples of gap solitons in the context of nonlinear optics.

preprint2014arXiv

Compressed representation of Kohn-Sham orbitals via selected columns of the density matrix

Given a set of Kohn-Sham orbitals from an insulating system, we present a simple, robust, efficient and highly parallelizable method to construct a set of, optionally orthogonal, localized basis functions for the associated subspace. Our method explicitly uses the fact that density matrices associated with insulating systems decay exponentially along the off-diagonal direction in the real space representation. Our method avoids the usage of an optimization procedure, and the localized basis functions are constructed directly from a set of selected columns of the density matrix (SCDM). Consequently, the only adjustable parameter in our method is the truncation threshold of the localized basis functions. Our method can be used in any electronic structure software package with an arbitrary basis set. We demonstrate the numerical accuracy and parallel scalability of the SCDM procedure using orbitals generated by the Quantum ESPRESSO software package. We also demonstrate a procedure for combining SCDM with Hockney's algorithm to efficiently perform Hartree-Fock exchange energy calculations with near linear scaling.

preprint2014arXiv

Directional Preconditioner for High Frequency Obstacle Scattering

The boundary integral method is an efficient approach for solving time-harmonic obstacle scattering problems by a bounded scatterer. This paper presents the directional preconditioner for the iterative solution of linear systems of the boundary integral method. This new preconditioner builds a data-sparse approximation of the integral operator, transforms it into a sparse linear system, and computes an approximate inverse with efficient sparse and hierarchical linear algebra algorithms. This preconditioner is efficient and results in small and almost frequency-independent iteration counts when combined with standard iterative solvers. Numerical results are provided to demonstrate the effectiveness of the new preconditioner.

preprint2014arXiv

Fast Directional Computation of High Frequency Boundary Integrals via Local FFTs

The boundary integral method is an efficient approach for solving time-harmonic acoustic obstacle scattering problems. The main computational task is the evaluation of an oscillatory boundary integral at each discretization point of the boundary. This paper presents a new fast algorithm for this task in two dimensions. This algorithm is built on top of directional low-rank approximations of the scattering kernel and uses oscillatory Chebyshev interpolation and local FFTs to achieve quasi-linear complexity. The algorithm is simple, fast, and kernel-independent. Numerical results are provided to demonstrate the effectiveness of the proposed algorithm.

preprint2014arXiv

Pole expansion for solving a type of parametrized linear systems in electronic structure calculations

We present a new method for solving parametrized linear systems. Under certain assumptions on the parametrization, solutions to the linear systems for all parameters can be accurately approximated by linear combinations of solutions to linear systems for a small set of fixed parameters. Combined with either direct solvers or preconditioned iterative solvers for each linear system with a fixed parameter, the method is particularly suitable for situations when solutions to a large number of distinct parameters or a large number of right hand sides are required. The method is also simple to parallelize. We demonstrate the applicability of the method to the calculation of the response functions in electronic structure theory. We demonstrate the numerical performance of the method using a benzene molecule and a DNA molecule.

preprint2014arXiv

Sparsifying preconditioner for pseudospectral approximations of indefinite systems on periodic structures

This paper introduces the sparsifying preconditioner for the pseudospectral approximation of highly indefinite systems on periodic structures, which include the frequency-domain response problems of the Helmholtz equation and the Schrödinger equation as examples. This approach transforms the dense system of the pseudospectral discretization approximately into an sparse system via an equivalent integral reformulation and a specially-designed sparsifying operator. The resulting sparse system is then solved efficiently with sparse linear algebra algorithms and serves as a reasonably accurate preconditioner. When combined with standard iterative methods, this new preconditioner results in small iteration counts. Numerical results are provided for the Helmholtz equation and the Schrödinger in both 2D and 3D to demonstrate the effectiveness of this new preconditioner.

preprint2014arXiv

Sparsifying Preconditioner for the Lippmann-Schwinger Equation

The Lippmann-Schwinger equation is an integral equation formulation for acoustic and electromagnetic scattering from an inhomogeneous media and quantum scattering from a localized potential. We present the sparsifying preconditioner for accelerating the iterative solution of the Lippmann-Schwinger equation. This new preconditioner transforms the discretized Lippmann-Schwinger equation into sparse form and leverages the efficient sparse linear algebra algorithms for computing an approximate inverse. This preconditioner is efficient and easy to implement. When combined with standard iterative methods, it results in almost frequency-independent iteration counts. We provide 2D and 3D numerical results to demonstrate the effectiveness of this new preconditioner.

preprint2013arXiv

A parallel butterfly algorithm

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform \int K(x,y) g(y) dy at large numbers of target points when the kernel, K(x,y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(N^d) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r^2 N^d log N). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of αand per-process inverse bandwidth of β, executes in at most O(r^2 N^d/p log N + βr N^d/p + α)log p) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x,y)=exp(i Φ(x,y)), where Φ(x,y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms and an analogue of a 3D generalized Radon transform were respectively observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.

preprint2013arXiv

A parallel sweeping preconditioner for heterogeneous 3D Helmholtz equations

A parallelization of a sweeping preconditioner for 3D Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ^2 N^{4/3}) and O(γ N log N), where γ(ω) denotes the modestly frequency-dependent number of grid points per Perfectly Matched Layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: "Parallel Sweeping Preconditioner (PSP)" and the underlying distributed multifrontal solver, "Clique".

preprint2013arXiv

Synchrosqueezed Curvelet Transform for 2D Mode Decomposition

This paper introduces the synchrosqueezed curvelet transform as an optimal tool for 2D mode decomposition of wavefronts or banded wave-like components. The synchrosqueezed curvelet transform consists of a generalized curvelet transform with application dependent geometric scaling parameters, and a synchrosqueezing technique for a sharpened phase space representation. In the case of a superposition of banded wave-like components with well-separated wave-vectors, it is proved that the synchrosqueezed curvelet transform is capable of recognizing each component and precisely estimating local wave-vectors. A discrete analogue of the continuous transform and several clustering models for decomposition are proposed in detail. Some numerical examples with synthetic and real data are provided to demonstrate the above properties of the proposed transform.

preprint2012arXiv

Element orbitals for Kohn-Sham density functional theory

We present a method to discretize the Kohn-Sham Hamiltonian matrix in the pseudopotential framework by a small set of basis functions automatically contracted from a uniform basis set such as planewaves. Each basis function is localized around an element, which is a small part of the global domain containing multiple atoms. We demonstrate that the resulting basis set achieves meV accuracy for 3D densely packed systems with a small number of basis functions per atom. The procedure is applicable to insulating and metallic systems.

preprint2011arXiv

Adaptive local basis set for Kohn-Sham density functional theory in a discontinuous Galerkin framework I: Total energy calculation

Kohn-Sham density functional theory is one of the most widely used electronic structure theories. In the pseudopotential framework, uniform discretization of the Kohn-Sham Hamiltonian generally results in a large number of basis functions per atom in order to resolve the rapid oscillations of the Kohn-Sham orbitals around the nuclei. Previous attempts to reduce the number of basis functions per atom include the usage of atomic orbitals and similar objects, but the atomic orbitals generally require fine tuning in order to reach high accuracy. We present a novel discretization scheme that adaptively and systematically builds the rapid oscillations of the Kohn-Sham orbitals around the nuclei as well as environmental effects into the basis functions. The resulting basis functions are localized in the real space, and are discontinuous in the global domain. The continuous Kohn-Sham orbitals and the electron density are evaluated from the discontinuous basis functions using the discontinuous Galerkin (DG) framework. Our method is implemented in parallel and the current implementation is able to handle systems with at least thousands of atoms. Numerical examples indicate that our method can reach very high accuracy (less than 1meV) with a very small number ($4\sim 40$) of basis functions per atom.

preprint2011arXiv

Optimized local basis set for Kohn-Sham density functional theory

We develop a technique for generating a set of optimized local basis functions to solve models in the Kohn-Sham density functional theory for both insulating and metallic systems. The optimized local basis functions are obtained by solving a minimization problem in an admissible set determined by a large number of primitive basis functions. Using the optimized local basis set, the electron energy and the atomic force can be calculated accurately with a small number of basis functions. The Pulay force is systematically controlled and is not required to be calculated, which makes the optimized local basis set an ideal tool for ab initio molecular dynamics and structure optimization. We also propose a preconditioned Newton-GMRES method to obtain the optimized local basis functions in practice. The optimized local basis set is able to achieve high accuracy with a small number of basis functions per atom when applied to a one dimensional model problem.

preprint2010arXiv

Fast construction of hierarchical matrix representation from matrix-vector multiplication

We develop a hierarchical matrix construction algorithm using matrix-vector multiplications, based on the randomized singular value decomposition of low-rank matrices. The algorithm uses $\mathcal{O}(\log n)$ applications of the matrix on structured random test vectors and $\mathcal{O}(n \log n)$ extra computational cost, where $n$ is the dimension of the unknown matrix. Numerical examples on constructing Green's functions for elliptic operators in two dimensions show efficiency and accuracy of the proposed algorithm.

preprint2010arXiv

Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation

The paper introduces the sweeping preconditioner, which is highly efficient for iterative solutions of the variable coefficient Helmholtz equation including very high frequency problems. The first central idea of this novel approach is to construct an approximate factorization of the discretized Helmholtz equation by sweeping the domain layer by layer, starting from an absorbing layer or boundary condition. Given this specific order of factorization, the second central idea of this approach is to represent the intermediate matrices in the hierarchical matrix framework. In two dimensions, both the construction and the application of the preconditioners are of linear complexity. The GMRES solver with the resulting preconditioner converges in an amazingly small number of iterations, which is essentially independent of the number of unknowns. This approach is also extended to the three dimensional case with some success. Numerical results are provided in both two and three dimensions to demonstrate the efficiency of this new approach.

preprint2010arXiv

Sweeping Preconditioner for the Helmholtz Equation: Moving Perfectly Matched Layers

This paper introduces a new sweeping preconditioner for the iterative solution of the variable coefficient Helmholtz equation in two and three dimensions. The algorithms follow the general structure of constructing an approximate $LDL^t$ factorization by eliminating the unknowns layer by layer starting from an absorbing layer or boundary condition. The central idea of this paper is to approximate the Schur complement matrices of the factorization using moving perfectly matched layers (PMLs) introduced in the interior of the domain. Applying each Schur complement matrix is equivalent to solving a quasi-1D problem with a banded LU factorization in the 2D case and to solving a quasi-2D problem with a multifrontal method in the 3D case. The resulting preconditioner has linear application cost and the preconditioned iterative solver converges in a number of iterations that is essentially indefinite of the number of unknowns or the frequency. Numerical results are presented in both two and three dimensions to demonstrate the efficiency of this new preconditioner.

Lexing Ying

What is connected

Connect this record

See the researcher in context

Building this map preview

57 published item(s)

Multimodal Sampling via Approximate Symmetries

Variational Actor-Critic Algorithms

A semigroup method for high dimensional elliptic PDEs and eigenvalue problems based on neural networks

Analytic continuation from limited noisy Matsubara data

Annealed importance sampling for Ising models with mixed boundary conditions

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

Correcting Convexity Bias in Function and Functional Estimate

Double Flip Move for Ising Models with Mixed Boundary Conditions

Enterprise-Scale Search: Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees

Operator Shifting for General Noisy Matrix Systems

Operator Shifting for Noisy Elliptic Systems

Pole recovery from noisy data on imaginary axis

Provably convergent quasistatic dynamics for mean-field two-player zero-sum games

A Simple Multiscale Method for Mean Field Games

How to Learn when Data Reacts to Your Model: Performative Gradient Descent

Multi-Level Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data

Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

A heuristic independent particle approximation to determinantal point processes

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent

A simple solver for the fractional Laplacian in multiple dimensions

Borrowing From the Future: Addressing Double Sampling in Model-free Control

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

Meta-learning Pseudo-differential Operators with Deep Neural Networks

Mirror Descent Algorithms for Minimizing Interacting Free Energy

Natural Gradient for Combined Loss Using Wavelets

Semidefinite relaxation of multi-marginal optimal transport for strictly correlated electrons in second quantization

Solving Electrical Impedance Tomography with Deep Learning

Stochastic modified equations for the asynchronous stochastic gradient descent

Adaptively compressed polarizability operator for accelerating large scale \textit{ab initio} phonon calculations

Additive Sweeping Preconditioner for the Helmholtz Equation

SCDM-k: Localized orbitals for solids via selected columns of the density matrix

Tensor Network Skeletonization

A Multiscale Butterfly Algorithm for Multidimensional Fourier Integral Operators

Butterfly Factorization

Compression of the electron repulsion integral tensor in tensor hypercontraction format with cubic scaling cost

Crystal image analysis using $2D$ synchrosqueezed transforms

Fast algorithm for periodic density fitting for Bloch waves

Hierarchical interpolative factorization for elliptic operators: differential equations

Hierarchical interpolative factorization for elliptic operators: integral equations

Recursive Sweeping Preconditioner for the 3D Helmholtz Equation

Sparsifying preconditioner for soliton calculations

Compressed representation of Kohn-Sham orbitals via selected columns of the density matrix

Directional Preconditioner for High Frequency Obstacle Scattering

Fast Directional Computation of High Frequency Boundary Integrals via Local FFTs

Pole expansion for solving a type of parametrized linear systems in electronic structure calculations

Sparsifying preconditioner for pseudospectral approximations of indefinite systems on periodic structures

Sparsifying Preconditioner for the Lippmann-Schwinger Equation

A parallel butterfly algorithm

A parallel sweeping preconditioner for heterogeneous 3D Helmholtz equations

Synchrosqueezed Curvelet Transform for 2D Mode Decomposition

Element orbitals for Kohn-Sham density functional theory

Adaptive local basis set for Kohn-Sham density functional theory in a discontinuous Galerkin framework I: Total energy calculation

Optimized local basis set for Kohn-Sham density functional theory

Fast construction of hierarchical matrix representation from matrix-vector multiplication

Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation

Sweeping Preconditioner for the Helmholtz Equation: Moving Perfectly Matched Layers