Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
49works
0followers
28topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

49 published item(s)

preprint2026arXiv

HyperVision: A Channel-Adaptive Ground-Based Hyperspectral Vision Pre-trained Backbone

While hyperspectral imaging provides rich spatial-spectral information across hundreds of narrow wavelength bands for precise material identification, ground-based hyperspectral pre-trained backbones remain absent, constrained by varying spectral configurations across sensors, the scarcity and inconsistency of labels, and the limited scale and scene diversity of existing datasets. To address these challenges and enable universal perception, we propose HyperVision, the first ground-based hyperspectral pre-trained backbone. First, to handle varying spectral configurations, HyperVision adopts a channel-adaptive dynamic embedding mechanism to map heterogeneous inputs into a unified token space. Second, to address the scarcity and inconsistency of labels, we introduce a multi-source pseudo-labeling method that fuses semantic representations from both spatial structures generated by SAM2 and fine-grained spectral material information extracted by HyperFree. Third, to compensate for limited dataset scale and enrich scene diversity, a cross-modal knowledge distillation mechanism is utilized to transfer rich semantic representations from a pre-trained RGB vision model to our hyperspectral backbone. Pre-trained on a collection of 15k images from 26 diverse ground-based datasets, HyperVision demonstrates exceptional generalization. Requiring only efficient head-only adaptation without adjusting backbone parameters, it achieves state-of-the-art performance compared to task-specific methods across three downstream tasks under varying sensor configurations, yielding up to a 16.3% relative improvement in hyperspectral semantic segmentation $\mathrm{Acc}_{\mathrm{M}}$, a 2.1% relative gain in object tracking AUC, and a 35.5% reduction in salient object detection MAE. The source code and pre-trained model will be publicly available at https://github.com/lronkitty/HyperVision .

preprint2022arXiv

A deep learning framework for geodesics under spherical Wasserstein-Fisher-Rao metric and its application for weighted sample generation

Wasserstein-Fisher-Rao (WFR) distance is a family of metrics to gauge the discrepancy of two Radon measures, which takes into account both transportation and weight change. Spherical WFR distance is a projected version of WFR distance for probability measures so that the space of Radon measures equipped with WFR can be viewed as metric cone over the space of probability measures with spherical WFR. Compared to the case for Wasserstein distance, the understanding of geodesics under the spherical WFR is less clear and still an ongoing research focus. In this paper, we develop a deep learning framework to compute the geodesics under the spherical WFR metric, and the learned geodesics can be adopted to generate weighted samples. Our approach is based on a Benamou-Brenier type dynamic formulation for spherical WFR. To overcome the difficulty in enforcing the boundary constraint brought by the weight change, a Kullback-Leibler (KL) divergence term based on the inverse map is introduced into the cost function. Moreover, a new regularization term using the particle velocity is introduced as a substitute for the Hamilton-Jacobi equation for the potential in dynamic formula. When used for sample generation, our framework can be beneficial for applications with given weighted samples, especially in the Bayesian inference, compared to sample generation with previous flow models.

preprint2022arXiv

Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks

We propose a novel numerical method for high dimensional Hamilton--Jacobi--Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least-squares temporal difference method using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive step size scheme to improve the accuracy near the domain boundary. Numerical examples up to $20$ spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, the diffusive Eikonal equations, and fully nonlinear elliptic PDEs derived from a regulator problem are presented to validate the effectiveness of our proposed method.

preprint2022arXiv

Algebraic localization implies exponential localization in non-periodic insulators

Exponentially-localized Wannier functions are a basis of the Fermi projection of a Hamiltonian consisting of functions which decay exponentially fast in space. In two and three spatial dimensions, it is well understood for periodic insulators that exponentially-localized Wannier functions exist if and only if there exists an orthonormal basis for the Fermi projection with finite second moment (i.e. all basis elements satisfy $\int |\boldsymbol{x}|^2 |w(\boldsymbol{x})|^2 \,\text{d}{\boldsymbol{x}} < \infty$). In this work, we establish a similar result for non-periodic insulators in two spatial dimensions. In particular, we prove that if there exists an orthonormal basis for the Fermi projection which satisfies $\int |\boldsymbol{x}|^{5 + ε} |w(\boldsymbol{x})|^2 \,\text{d}{\boldsymbol{x}} < \infty$ for some $ε> 0$ then there also exists an orthonormal basis for the Fermi projection which decays exponentially fast in space. This result lends support to the Localization Dichotomy Conjecture for non-periodic systems recently proposed by Marcelli, Monaco, Moscolari, and Panati

preprint2022arXiv

Asymptotic analysis of diabatic surface hopping algorithm in the adiabatic and non-adiabatic limits

Surface hopping algorithms, as an important class of quantum dynamics simulation algorithms for non-adiabatic dynamics, are typically performed in the adiabatic representation, which can break down in the presence of ill-defined adiabatic potential energy surfaces (PESs) and adiabatic coupling term. Another issue of surface hopping algorithms is the difficulty in capturing the correct scaling of the transition rate in the Marcus (weak-coupling/non-adiabatic) regime. Though the first issue can be circumvented by exploiting the diabatic representation, diabatic surface hopping algorithms usually lack justification on the theoretical level. We consider the diabatic surface hopping algorithm proposed in [Fang, Lu. Multiscale Model. Simul. 16:4, 1603-1622, 2018] and provide the asymptotic analysis of the transition rate in the Marcus regime that justifies the correct scaling for the spin-boson model. We propose two conditions that guarantee the correctness for general potentials. In the opposite (strong-coupling/adiabatic) regime, we derive the asymptotic behavior of the algorithm that interestingly matches a type of mean-field description. The techniques used here may shed light on the analysis for other diabatic-based algorithms.

preprint2022arXiv

Complexity of zigzag sampling algorithm for strongly log-concave distributions

We study the computational complexity of zigzag sampling algorithm for strongly log-concave distributions. The zigzag process has the advantage of not requiring time discretization for implementation, and that each proposed bouncing event requires only one evaluation of partial derivative of the potential, while its convergence rate is dimension independent. Using these properties, we prove that the zigzag sampling algorithm achieves $\varepsilon$ error in chi-square divergence with a computational cost equivalent to $O\bigl(κ^2 d^\frac{1}{2}(\log\frac{1}{\varepsilon})^{\frac{3}{2}}\bigr)$ gradient evaluations in the regime $κ\ll \frac{d}{\log d}$ under a warm start assumption, where $κ$ is the condition number and $d$ is the dimension.

preprint2022arXiv

Fast Algorithms of Bath Calculations in Simulations of Quantum System-Bath Dynamics

We present fast algorithms for the summation of Dyson series and the inchworm Monte Carlo method for quantum systems that are coupled with harmonic baths. The algorithms are based on evolving the integro-differential equations where the most expensive part comes from the computation of bath influence functionals. To accelerate the computation, we design fast algorithms based on reusing the bath influence functionals computed in the previous time steps to reduce the number of calculations. It is proven that the proposed fast algorithms reduce the number of such calculations by a factor of $O(N)$, where $N$ is the total number of time steps. Numerical experiments are carried out to show the efficiency of the method and to verify the theoretical results.

preprint2022arXiv

Low-rank approximation for multiscale PDEs

Historically, analysis for multiscale PDEs is largely unified while numerical schemes tend to be equation-specific. In this paper, we propose a unified framework for computing multiscale problems through random sampling. This is achieved by incorporating randomized SVD solvers and manifold learning techniques to numerically reconstruct the low-rank features of multiscale PDEs. We use multiscale radiative transfer equation and elliptic equation with rough media to showcase the application of this framework.

preprint2022arXiv

Neural Network Based Variational Methods for Solving Quadratic Porous Medium Equations in High Dimensions

In this paper, we propose and study neural network based methods for solutions of high-dimensional quadratic porous medium equation (QPME). Three variational formulations of this nonlinear PDE are presented: a strong formulation and two weak formulations. For the strong formulation, the solution is directly parameterized with a neural network and optimized by minimizing the PDE residual. It can be proved that the convergence of the optimization problem guarantees the convergence of the approximate solution in the $L^1$ sense. The weak formulations are derived following Brenier, Y., 2020, which characterizes the very weak solutions of QPME. Specifically speaking, the solutions are represented with intermediate functions who are parameterized with neural networks and are trained to optimize the weak formulations. Extensive numerical tests are further carried out to investigate the pros and cons of each formulation in low and high dimensions. This is an initial exploration made along the line of solving high-dimensional nonlinear PDEs with neural network based methods, which we hope can provide some useful experience for future investigations.

preprint2022arXiv

On the closedness and geometry of tensor network state sets

Tensor network states (TNS) are a powerful approach for the study of strongly correlated quantum matter. The curse of dimensionality is addressed by parametrizing the many-body state in terms of a network of partially contracted tensors. These tensors form a substantially reduced set of effective degrees of freedom. In practical algorithms, functionals like energy expectation values or overlaps are optimized over certain sets of TNS. Concerning algorithmic stability, it is important whether the considered sets are closed because, otherwise, the algorithms may approach a boundary point that is outside the TNS set and tensor elements diverge. We discuss the closedness and geometries of TNS sets, and we propose regularizations for optimization problems on non-closed TNS sets. We show that sets of matrix product states (MPS) with open boundary conditions, tree tensor network states (TTNS), and the multiscale entanglement renormalization ansatz (MERA) are always closed, whereas sets of translation-invariant MPS with periodic boundary conditions (PBC), heterogeneous MPS with PBC, and projected entangled-pair states (PEPS) are generally not closed. The latter is done using explicit examples like the W state, states that we call two-domain states, and fine-grained versions thereof.

preprint2022arXiv

Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction

Previous works on human motion prediction follow the pattern of building a mapping relation between the sequence observed and the one to be predicted. However, due to the inherent complexity of multivariate time series data, it still remains a challenge to find the extrapolation relation between motion sequences. In this paper, we present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task from the view of interpolation. These poses exist after the predicted sequence, and form the privileged sequence. To be specific, we first propose an InTerPolation learning Network (ITP-Network) that encodes both the observed sequence and the privileged sequence to interpolate the in-between predicted sequence, wherein the embedded Privileged-sequence-Encoder (Priv-Encoder) learns the privileged knowledge (PK) simultaneously. Then, we propose a Final Prediction Network (FP-Network) for which the privileged sequence is not observable, but is equipped with a novel PK-Simulator that distills PK learned from the previous network. This simulator takes as input the observed sequence, but approximates the behavior of Priv-Encoder, enabling FP-Network to imitate the interpolation process. Extensive experimental results demonstrate that our prediction pattern achieves state-of-the-art performance on benchmarked H3.6M, CMU-Mocap and 3DPW datasets in both short-term and long-term predictions.

preprint2022arXiv

Posterior computation with the Gibbs zig-zag sampler

An intriguing new class of piecewise deterministic Markov processes (PDMPs) has recently been proposed as an alternative to Markov chain Monte Carlo (MCMC). In order to facilitate the application to a larger class of problems, we propose a new class of PDMPs termed Gibbs zig-zag samplers, which allow parameters to be updated in blocks with a zig-zag sampler applied to certain parameters and traditional MCMC-style updates to others. We demonstrate the flexibility of this framework on posterior sampling for logistic models with shrinkage priors for high-dimensional regression and random effects and provide conditions for geometric ergodicity and the validity of a central limit theorem.

preprint2022arXiv

Quantum Orbital Minimization Method for Excited States Calculation on Quantum Computer

We propose a quantum-classical hybrid variational algorithm, the quantum orbital minimization method (qOMM), for obtaining the ground state and low-lying excited states of a Hermitian operator. Given parameterized ansatz circuits representing eigenstates, qOMM implements quantum circuits to represent the objective function in the orbital minimization method and adopts classical optimizer to minimize the objective function with respect to parameters in ansatz circuits. The objective function has orthogonality implicitly embedded, which allows qOMM to apply a different ansatz circuit to each reference state. We carry out numerical simulations that seek to find excited states of the $\text{H}_{2}$, $\text{LiH}$, and a toy model consisting of 4 hydrogen atoms arranged in a square lattice in the STO-3G basis and UCCSD ansatz circuits. Comparing the numerical results with existing excited states methods, qOMM is less prone to getting stuck in local minima and can achieve convergence with more shallow ansatz circuits.

preprint2022arXiv

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1})^2)$. The method in the proof is applicable to general single time-scale bilevel optimization problem. We also numerically validate our theoretical results on the convergence.

preprint2022arXiv

Universal approximation of symmetric and anti-symmetric functions

We consider universal approximations of symmetric and anti-symmetric functions, which are important for applications in quantum physics, as well as other scientific and engineering computations. We give constructive approximations with explicit bounds on the number of parameters with respect to the dimension and the target accuracy $ε$. While the approximation still suffers from the curse of dimensionality, to the best of our knowledge, these are the first results in the literature with explicit error bounds for functions with symmetry or anti-symmetry constraints.

preprint2021arXiv

Complexity of randomized algorithms for underdamped Langevin dynamics

We establish an information complexity lower bound of randomized algorithms for simulating underdamped Langevin dynamics. More specifically, we prove that the worst $L^2$ strong error is of order $Ω(\sqrt{d}\, N^{-3/2})$, for solving a family of $d$-dimensional underdamped Langevin dynamics, by any randomized algorithm with only $N$ queries to $\nabla U$, the driving Brownian motion and its weighted integration, respectively. The lower bound we establish matches the upper bound for the randomized midpoint method recently proposed by Shen and Lee [NIPS 2019], in terms of both parameters $N$ and $d$.

preprint2021arXiv

Existence and computation of generalized Wannier functions for non-periodic systems in two dimensions and higher

Exponentially-localized Wannier functions (ELWFs) are an orthonormal basis of the Fermi projection of a material consisting of functions which decay exponentially fast away from their maxima. When the material is insulating and crystalline, conditions which guarantee existence of ELWFs in dimensions one, two, and three are well-known, and methods for constructing the ELWFs numerically are well-developed. We consider the case where the material is insulating but not necessarily crystalline, where much less is known. In one spatial dimension, Kivelson and Nenciu-Nenciu have proved ELWFs can be constructed as the eigenfunctions of a self-adjoint operator acting on the Fermi projection. In this work, we identify an assumption under which we can generalize the Kivelson-Nenciu-Nenciu result to two dimensions and higher. Under this assumption, we prove that ELWFs can be constructed as the eigenfunctions of a sequence of self-adjoint operators acting on the Fermi projection. We conjecture that the assumption we make is equivalent to vanishing of topological obstructions to the existence of ELWFs in the special case where the material is crystalline. We numerically verify that our construction yields ELWFs in various cases where our assumption holds and provide numerical evidence for our conjecture.

preprint2021arXiv

Neural Collapse with Cross-Entropy Loss

We consider the variational problem of cross-entropy loss with $n$ feature vectors on a unit hypersphere in $\mathbb{R}^d$. We prove that when $d \geq n - 1$, the global minimum is given by the simplex equiangular tight frame, which justifies the neural collapse behavior. We also prove that as $n \rightarrow \infty$ with fixed $d$, the minimizing points will distribute uniformly on the hypersphere and show a connection with the frame potential of Benedetto & Fickus.

preprint2021arXiv

Neural-Network Quantum States for Periodic Systems in Continuous Space

We introduce a family of neural quantum states for the simulation of strongly interacting systems in the presence of spatial periodicity. Our variational state is parameterized in terms of a permutationally-invariant part described by the Deep Sets neural-network architecture. The input coordinates to the Deep Sets are periodically transformed such that they are suitable to directly describe periodic bosonic systems. We show example applications to both one and two-dimensional interacting quantum gases with Gaussian interactions, as well as to $^4$He confined in a one-dimensional geometry. For the one-dimensional systems we find very precise estimations of the ground-state energies and the radial distribution functions of the particles. In two dimensions we obtain good estimations of the ground-state energies, comparable to results obtained from more conventional methods.

preprint2021arXiv

On explicit $L^2$-convergence rate estimate for piecewise deterministic Markov processes in MCMC algorithms

We establish $L^2$-exponential convergence rate for three popular piecewise deterministic Markov processes for sampling: the randomized Hamiltonian Monte Carlo method, the zigzag process, and the bouncy particle sampler. Our analysis is based on a variational framework for hypocoercivity, which combines a Poincaré-type inequality in time-augmented state space and a standard $L^2$ energy estimate. Our analysis provides explicit convergence rate estimates, which are more quantitative than existing results.

preprint2021arXiv

Symmetry Breaking in Density Functional Theory due to Dirac Exchange for a Hydrogen Molecule

We study symmetry breaking in the mean field solutions to the 2 electron hydrogen molecule within Kohn Sham (KS) local spin density function theory with Dirac exchange (the XLDA model). This simplified model shows behavior related to that of the (KS) spin density functional theory (SDFT) predictions in condensed and molecular systems. The Kohn Sham solutions to the constrained SDFT variation problem undergo spontaneous symmetry breaking as the relative strength of the non-convex exchange term increases. This results in the change of the molecular ground state from a paramagnetic state to an antiferromagnetic ground states and a stationary symmetric delocalized 1st excited state. We further characterize the limiting behavior of the minimizer when the strength of the exchange term goes to infinity. This leads to further bifurcations and highly localized states with varying character. The stability of the various solution classes is demonstrated by Hessian analysis. Finite element numerical results provide support for the formal conjectures.

preprint2020arXiv

A low-rank Schwarz method for radiative transport equation with heterogeneous scattering coefficient

Random sampling has been used to find low-rank structure and to build fast direct solvers for multiscale partial differential equations of various types. In this work, we design an accelerated Schwarz method for radiative transfer equations that makes use of approximate local solution maps constructed offline via a random sampling strategy. Numerical examples demonstrate the accuracy, robustness, and efficiency of the proposed approach.

preprint2020arXiv

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically, we propose a new continuum limit of deep residual networks, which enjoys a good landscape in the sense that every local minimizer is global. This characterization enables us to derive the first global convergence result for multilayer neural networks in the mean-field regime. Furthermore, without assuming the convexity of the loss landscape, our proof relies on a zero-loss assumption at the global minimizer that can be achieved when the model shares a universal approximation property. Key to our result is the observation that a deep residual network resembles a shallow network ensemble, i.e. a two-layer network. We bound the difference between the shallow network and our ResNet model via the adjoint sensitivity method, which enables us to apply existing mean-field analyses of two-layer networks to deep networks. Furthermore, we propose several novel training schemes based on the new continuous model, including one training procedure that switches the order of the residual blocks and results in strong empirical performance on the benchmark datasets.

preprint2020arXiv

A Proximal-Gradient Algorithm for Crystal Surface Evolution

As a counterpoint to recent numerical methods for crystal surface evolution, which agree well with microscopic dynamics but suffer from significant stiffness that prevents simulation on fine spatial grids, we develop a new numerical method based on the macroscopic partial differential equation, leveraging its formal structure as the gradient flow of the total variation energy, with respect to a weighted $H^{-1}$ norm. This gradient flow structure relates to several metric space gradient flows of recent interest, including 2-Wasserstein flows and their generalizations to nonlinear mobilities. We develop a novel semi-implicit time discretization of the gradient flow, inspired by the classical minimizing movements scheme (known as the JKO scheme in the 2-Wasserstein case). We then use a primal dual hybrid gradient (PDHG) method to compute each element of the semi-implicit scheme. In one dimension, we prove convergence of the PDHG method to the semi-implicit scheme, under general integrability assumptions on the mobility and its reciprocal. Finally, by taking finite difference approximations of our PDHG method, we arrive at a fully discrete numerical algorithm, with iterations that converge at a rate independent of the spatial discretization: in particular, the convergence properties do not deteriorate as we refine our spatial grid. We close with several numerical examples illustrating the properties of our method, including facet formation at local maxima, pinning at local minima, and convergence as the spatial and temporal discretizations are refined.

preprint2020arXiv

Bloch dynamics with second order Berry phase correction

We derive the semiclassical Bloch dynamics with the second-order Berry phase correction in the presence of the slow-varying scalar potential as perturbation. Our mathematical derivation is based on a two-scale WKB asymptotic analysis. For a uniform external electric field, the bi-characteristics system after a positional shift introduced by Berry connections agrees with the recent result in previous works. Moreover, for the case with a linear external electric field, we show that the extra terms arising in the bi-characteristics system after the positional shift are also gauge independent.

preprint2020arXiv

Butterfly-Net: Optimal Function Representation Based on Convolutional Neural Networks

Deep networks, especially convolutional neural networks (CNNs), have been successfully applied in various areas of machine learning as well as to challenging problems in other scientific and engineering fields. This paper introduces Butterfly-Net, a low-complexity CNN with structured and sparse cross-channel connections, together with a Butterfly initialization strategy for a family of networks. Theoretical analysis of the approximation power of Butterfly-Net to the Fourier representation of input data shows that the error decays exponentially as the depth increases. Combining Butterfly-Net with a fully connected neural network, a large class of problems are proved to be well approximated with network complexity depending on the effective frequency bandwidth instead of the input dimension. Regular CNN is covered as a special case in our analysis. Numerical experiments validate the analytical results on the approximation of Fourier kernels and energy functionals of Poisson&#39;s equations. Moreover, all experiments support that training from Butterfly initialization outperforms training from random initialization. Also, adding the remaining cross-channel connections, although significantly increase the parameter number, does not much improve the post-training accuracy and is more sensitive to data distribution.

preprint2020arXiv

Continuum limit and preconditioned Langevin sampling of the path integral molecular dynamics

We investigate the continuum limit that the number of beads goes to infinity in the ring polymer representation of thermal averages. Studying the continuum limit of the trajectory sampling equation sheds light on possible preconditioning techniques for sampling ring polymer configurations with large number of beads. We propose two preconditioned Langevin sampling dynamics, which are shown to have improved stability and sampling accuracy. We present a careful mode analysis of the preconditioned dynamics and show their connections to the normal mode, the staging coordinate and the Matsubara mode representation for ring polymers. In the case where the potential is quadratic, we show that the continuum limit of the preconditioned mass modified Langevin dynamics converges to its equilibrium exponentially fast, which suggests that the finite-dimensional counterpart has a dimension-independent convergence rate. In addition, the preconditioning techniques can be naturally applied to the multi-level quantum systems in the nonadiabatic regime, which are compatible with various numerical approaches.

preprint2020arXiv

Convergence of Stochastic-extended Lagrangian molecular dynamics method for polarizable force field simulation

Extended Lagrangian molecular dynamics (XLMD) is a general method for performing molecular dynamics simulations using quantum and classical many-body potentials. Recently several new XLMD schemes have been proposed and tested on several classes of many-body polarization models such as induced dipoles or Drude charges, by creating an auxiliary set of these same degrees of freedom that are reversibly integrated through time. This gives rise to a singularly perturbed Hamiltonian system that provides a good approximation to the time evolution of the real mutual polarization field. To further improve upon the accuracy of the XLMD dynamics, and to potentially extend it to other many-body potentials, we introduce a stochastic modification which leads to a set of singularly perturbed Langevin equations with degenerate noise. We prove that the resulting Stochastic-XLMD converges to the accurate dynamics, and the convergence rate is both optimal and is independent of the accuracy of the initial polarization field. We carefully study the scaling of the damping factor and numerical noise for efficient numerical simulation for Stochastic-XLMD, and we demonstrate the effectiveness of the method for model polarizable force field systems.

preprint2020arXiv

Defect resonances of truncated crystal structures

Defects in the atomic structure of crystalline materials may spawn electronic bound states, known as \emph{defect states}, which decay rapidly away from the defect. Simplified models of defect states typically assume the defect is surrounded on all sides by an infinite perfectly crystalline material. In reality the surrounding structure must be finite, and in certain contexts the structure can be small enough that edge effects are significant. In this work we investigate these edge effects and prove the following result. Suppose that a one-dimensional infinite crystalline material hosting a positive energy defect state is truncated a distance $M$ from the defect. Then, for sufficiently large $M$, there exists a resonance \emph{exponentially close} (in $M$) to the bound state eigenvalue. It follows that the truncated structure hosts a metastable state with an exponentially long lifetime. Our methods allow both the resonance frequency and associated resonant state to be computed to all orders in $e^{-M}$. We expect this result to be of particular interest in the context of photonic crystals, where defect states are used for wave-guiding and structures are relatively small. Finally, under a mild additional assumption we prove that if the defect state has negative energy then the truncated structure hosts a bound state with exponentially-close energy.

preprint2020arXiv

ELSI -- An Open Infrastructure for Electronic Structure Solvers

Routine applications of electronic structure theory to molecules and periodic systems need to compute the electron density from given Hamiltonian and, in case of non-orthogonal basis sets, overlap matrices. System sizes can range from few to thousands or, in some examples, millions of atoms. Different discretization schemes (basis sets) and different system geometries (finite non-periodic vs. infinite periodic boundary conditions) yield matrices with different structures. The ELectronic Structure Infrastructure (ELSI) project provides an open-source software interface to facilitate the implementation and optimal use of high-performance solver libraries covering cubic scaling eigensolvers, linear scaling density-matrix-based algorithms, and other reduced scaling methods in between. In this paper, we present recent improvements and developments inside ELSI, mainly covering (1) new solvers connected to the interface, (2) matrix layout and communication adapted for parallel calculations of periodic and/or spin-polarized systems, (3) routines for density matrix extrapolation in geometry optimization and molecular dynamics calculations, and (4) general utilities such as parallel matrix I/O and JSON output. The ELSI interface has been integrated into four electronic structure code projects (DFTB+, DGDFT, FHI-aims, SIESTA), allowing us to rigorously benchmark the performance of the solvers on an equal footing. Based on results of a systematic set of large-scale benchmarks performed with Kohn-Sham density-functional theory and density-functional tight-binding theory, we identify factors that strongly affect the efficiency of the solvers, and propose a decision layer that assists with the solver selection process. Finally, we describe a reverse communication interface encoding matrix-free iterative solver strategies that are amenable, e.g., for use with planewave basis sets.

preprint2020arXiv

End-to-end Learning for Inter-Vehicle Distance and Relative Velocity Estimation in ADAS with a Monocular Camera

Inter-vehicle distance and relative velocity estimations are two basic functions for any ADAS (Advanced driver-assistance systems). In this paper, we propose a monocular camera-based inter-vehicle distance and relative velocity estimation method based on end-to-end training of a deep neural network. The key novelty of our method is the integration of multiple visual clues provided by any two time-consecutive monocular frames, which include deep feature clue, scene geometry clue, as well as temporal optical flow clue. We also propose a vehicle-centric sampling mechanism to alleviate the effect of perspective distortion in the motion field (i.e. optical flow). We implement the method by a light-weight deep neural network. Extensive experiments are conducted which confirm the superior performance of our method over other state-of-the-art methods, in terms of estimation accuracy, computational speed, and memory footprint.

preprint2020arXiv

Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds

Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics. In this work, we consider the problem of estimating the normalizing constant $Z=\int_{\mathbb{R}^d} e^{-f(x)}\,\mathrm{d}x$ to within a multiplication factor of $1 \pm \varepsilon$ for a $μ$-strongly convex and $L$-smooth function $f$, given query access to $f(x)$ and $\nabla f(x)$. We give both algorithms and lowerbounds for this problem. Using an annealing algorithm combined with a multilevel Monte Carlo method based on underdamped Langevin dynamics, we show that $\widetilde{\mathcal{O}}\Bigl(\frac{d^{4/3}κ+ d^{7/6}κ^{7/6}}{\varepsilon^2}\Bigr)$ queries to $\nabla f$ are sufficient, where $κ= L / μ$ is the condition number. Moreover, we provide an information theoretic lowerbound, showing that at least $\frac{d^{1-o(1)}}{\varepsilon^{2-o(1)}}$ queries are necessary. This provides a first nontrivial lowerbound for the problem.

preprint2020arXiv

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

While pre-training and fine-tuning, e.g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage. In this paper, we propose LightPAFF, a Lightweight Pre-training And Fine-tuning Framework that leverages two-stage knowledge distillation to transfer knowledge from a big teacher model to a lightweight student model in both pre-training and fine-tuning stages. In this way the lightweight model can achieve similar accuracy as the big teacher model, but with much fewer parameters and thus faster online inference speed. LightPAFF can support different pre-training methods (such as BERT, GPT-2 and MASS~\citep{song2019mass}) and be applied to many downstream tasks. Experiments on three language understanding tasks, three language modeling tasks and three sequence to sequence generation tasks demonstrate that while achieving similar accuracy with the big BERT, GPT-2 and MASS models, LightPAFF reduces the model size by nearly 5x and improves online inference speed by 5x-7x.

preprint2020arXiv

Neural Machine Translation with Error Correction

Neural machine translation (NMT) generates the next target token given as input the previous ground truth target tokens during training while the previous generated target tokens during inference, which causes discrepancy between training and inference as well as error propagation, and affects the translation accuracy. In this paper, we introduce an error correction mechanism into NMT, which corrects the error information in the previous generated tokens to better predict the next token. Specifically, we introduce two-stream self-attention from XLNet into NMT decoder, where the query stream is used to predict the next token, and meanwhile the content stream is used to correct the error information from the previous predicted tokens. We leverage scheduled sampling to simulate the prediction errors during training. Experiments on three IWSLT translation datasets and two WMT translation datasets demonstrate that our method achieves improvements over Transformer baseline and scheduled sampling. Further experimental analyses also verify the effectiveness of our proposed error correction mechanism to improve the translation quality.

preprint2020arXiv

Non-Convex Planar Harmonic Maps

We formulate a novel characterization of a family of invertible maps between two-dimensional domains. Our work follows two classic results: The Radó-Kneser-Choquet (RKC) theorem, which establishes the invertibility of harmonic maps into a convex planer domain; and Tutte&#39;s embedding theorem for planar graphs - RKC&#39;s discrete counterpart - which proves the invertibility of piecewise linear maps of triangulated domains satisfying a discrete-harmonic principle, into a convex planar polygon. In both theorems, the convexity of the target domain is essential for ensuring invertibility. We extend these characterizations, in both the continuous and discrete cases, by replacing convexity with a less restrictive condition. In the continuous case, Alessandrini and Nesi provide a characterization of invertible harmonic maps into non-convex domains with a smooth boundary by adding additional conditions on orientation preservation along the boundary. We extend their results by defining a condition on the normal derivatives along the boundary, which we call the cone condition; this condition is tractable and geometrically intuitive, encoding a weak notion of local invertibility. The cone condition enables us to extend Alessandrini and Nesi to the case of harmonic maps into non-convex domains with a piecewise-smooth boundary. In the discrete case, we use an analog of the cone condition to characterize invertible discrete-harmonic piecewise-linear maps of triangulations. This gives an analog of our continuous results and characterizes invertible discrete-harmonic maps in terms of the orientation of triangles incident on the boundary.

preprint2020arXiv

Optimal Orbital Selection for Full Configuration Interaction (OptOrbFCI): Pursuing the Basis Set Limit under a Budget

Full configuration interaction (FCI) solvers are limited to small basis sets due to their expensive computational costs. An optimal orbital selection for FCI (OptOrbFCI) is proposed to boost the power of existing FCI solvers to pursue the basis set limit under a computational budget. The optimization problem coincides with that of the complete active space SCF method (CASSCF), while OptOrbFCI is algorithmically quite different. OptOrbFCI effectively finds an optimal rotation matrix via solving a constrained optimization problem directly to compress the orbitals of large basis sets to one with a manageable size, conducts FCI calculations only on rotated orbital sets, and produces a variational ground-state energy and its wave function. Coupled with coordinate descent full configuration interaction (CDFCI), we demonstrate the efficiency and accuracy of the method on the carbon dimer and nitrogen dimer under basis sets up to cc-pV5Z. We also benchmark the binding curve of the nitrogen dimer under the cc-pVQZ basis set with 28 selected orbitals, which provide consistently lower ground-state energies than the FCI results under the cc-pVDZ basis set. The dissociation energy in this case is found to be of higher accuracy.

preprint2020arXiv

Random Sampling and Efficient Algorithms for Multiscale PDEs

We describe a numerical framework that uses random sampling to efficiently capture low-rank local solution spaces of multiscale PDE problems arising in domain decomposition. In contrast to existing techniques, our method does not rely on detailed analytical understanding of specific multiscale PDEs, in particular, their asymptotic limits. We present the application of the framework on two examples --- a linear kinetic equation and an elliptic equation with rough media. On these two examples, this framework achieves the asymptotic preserving property for the kinetic equations and numerical homogenization for the elliptic equations.

preprint2020arXiv

Stable Phase Retrieval from Locally Stable and Conditionally Connected Measurements

This paper is concerned with stable phase retrieval for a family of phase retrieval models we name &#34;locally stable and conditionally connected&#34; (LSCC) measurement schemes. For every signal $f$, we associate a corresponding weighted graph $G_f$, defined by the LSCC measurement scheme, and show that the phase retrievability of the signal $f$ is determined by the connectivity of $G_f$. We then characterize the phase retrieval stability of the signal $f$ by two measures that are commonly used in graph theory to quantify graph connectivity: the Cheeger constant of $G_f$ for real valued signals, and the algebraic connectivity of $G_f$ for complex valued signals. We use our results to study the stability of two phase retrieval models that can be cast as LSCC measurement schemes, and focus on understanding for which signals the &#34;curse of dimensionality&#34; can be avoided. The first model we discuss is a finite-dimensional model for locally supported measurements such as the windowed Fourier transform. For signals &#34;without large holes&#34;, we show the stability constant exhibits only a mild polynomial growth in the dimension, in stark contrast with the exponential growth which uniform stability constants tend to suffer from; more precisely, in $R^d$ the constant grows proportionally to $d^{1/2}$, while in $C^d$ it grows proportionally to $d$. We also show the growth of the constant in the complex case cannot be reduced, suggesting that complex phase retrieval is substantially more difficult than real phase retrieval. The second model we consider is an infinite-dimensional phase retrieval problem in a principal shift invariant space. We show that despite the infinite dimensionality of this model, signals with monotone exponential decay will have a finite stability constant. In contrast, the stability bound provided by our results will be infinite if the signal&#39;s decay is polynomial.

preprint2020arXiv

Tensor Ring Decomposition: Optimization Landscape and One-loop Convergence of Alternating Least Squares

In this work, we study the tensor ring decomposition and its associated numerical algorithms. We establish a sharp transition of algorithmic difficulty of the optimization problem as the bond dimension increases: On one hand, we show the existence of spurious local minima for the optimization landscape even when the tensor ring format is much over-parameterized, i.e., with bond dimension much larger than that of the true target tensor. On the other hand, when the bond dimension is further increased, we establish one-loop convergence for alternating least square algorithm for tensor ring decomposition. The theoretical results are complemented by numerical experiments for both local minimum and one-loop convergence for the alternating least square algorithm.

preprint2020arXiv

The Iterated Projected Position Algorithm for Constructing Exponentially Localized Generalized Wannier Functions for Periodic and Non-Periodic Insulators in Two Dimensions and Higher

Localized bases play an important role in understanding electronic structure. In periodic insulators, a natural choice of localized basis is given by the Wannier functions which depend a choice of unitary transform known as a gauge transformation. Over the past few decades, there have been many works which have focused on optimizing the choice of gauge so that the corresponding Wannier functions are maximally localized or reflect some symmetry of the underlying system. In this work, we consider fully non-periodic materials where the usual Wannier functions are not well defined and gauge optimization is impossible. To tackle the problem of calculating exponentially localized generalized Wannier functions in both periodic and non-periodic system we discuss the &#34;Iterated Projected Position (IPP)&#34; algorithm. The IPP algorithm is based on matrix diagonalization and therefore unlike optimization based approaches it does not require initialization and cannot get stuck at a local minimum. Furthermore, the IPP algorithm is guaranteed by a rigorous analysis to produce exponentially localized functions under certain mild assumptions. We numerically demonstrate that the IPP algorithm can be used to calculate exponentially localized bases for the Haldane model, the Kane-Mele model (in both $\mathbb{Z}_2$ invariant even and $\mathbb{Z}_2$ invariant odd phases), and the $p_x + i p_y$ model on a quasi-crystal lattice.

preprint2019arXiv

A stochastic version of Stein Variational Gradient Descent for efficient sampling

We propose in this work RBM-SVGD, a stochastic version of Stein Variational Gradient Descent (SVGD) method for efficiently sampling from a given probability measure and thus useful for Bayesian inference. The method is to apply the Random Batch Method (RBM) for interacting particle systems proposed by Jin et al to the interacting particle systems in SVGD. While keeping the behaviors of SVGD, it reduces the computational cost, especially when the interacting kernel has long range. Numerical examples verify the efficiency of this new version of SVGD.

preprint2019arXiv

Computing edge states without hard truncation

We present a numerical method which accurately computes the discrete spectrum and associated bound states of Hamiltonians which model electronic &#34;edge&#34; states localized at boundaries of one and two-dimensional crystalline materials. The problem is non-trivial since arbitrarily large finite &#34;hard&#34; truncations of the Hamiltonian in the infinite bulk direction tend to produce spurious bound states partially supported at the truncation. Our method, which overcomes this difficulty, is to compute the Green&#39;s function of the Hamiltonian by imposing an appropriate boundary condition in the bulk direction; then, the spectral data is recovered via Riesz projection. We demonstrate our method&#39;s effectiveness by studies of edge states at a graphene zig-zag edge in the presence of defects modeled both by a discrete tight-binding model and a continuum PDE model under finite difference discretization. Our method may also be used to study states localized at domain wall-type edges in one and two-dimensional materials where the edge Hamiltonian is infinite in both directions; we demonstrate this for the case of a tight-binding model of distinct honeycomb structures joined along a zig-zag edge.

preprint2019arXiv

Coordinate-wise descent methods for leading eigenvalue problem

Leading eigenvalue problems for large scale matrices arise in many applications. Coordinate-wise descent methods are considered in this work for such problems based on a reformulation of the leading eigenvalue problem as a non-convex optimization problem. The convergence of several coordinate-wise methods is analyzed and compared. Numerical examples of applications to quantum many-body problems demonstrate the efficiency and provide benchmarks of the proposed coordinate-wise descent methods.

preprint2019arXiv

Dirac operators and domain walls

We study the eigenvalue problem for a one-dimensional Dirac operator with a spatially varying ``mass&#39;&#39; term. It is well-known that when the mass function has the form of a kink, or \emph{domain wall}, transitioning between strictly positive and strictly negative asymptotic mass, $\pmκ_\infty$, at $\pm\infty$, the Dirac operator has a simple eigenvalue of zero energy (geometric multiplicity equal to one) within a gap in the continuous spectrum, with corresponding \emph{zero mode}, an exponentially localized eigenfunction. We prove that when the mass function has the form of \emph{two} domain walls separated by a sufficiently large distance $2 δ$, the Dirac operator has two real simple eigenvalues of opposite sign and of order $e^{- 2 |κ_\infty| δ}$. The associated eigenfunctions are, up to $L^2$ error of order $e^{- 2 |κ_\infty| δ}$, linear combinations of shifted copies of the single domain wall zero mode. For the case of three domain walls, there are two non-zero simple eigenvalues as above and a simple eigenvalue at energy zero. Our methods are based on a Lyapunov-Schmidt reduction strategy and we outline their natural extension to the case of $n$ domain walls for which the minimal distance between domain walls is sufficiently large. The class of Dirac operators we consider controls the bifurcation of topologically protected ``edge states&#39;&#39; from Dirac points (linear band crossings) for classes of Schrödinger operators with domain-wall modulated periodic potentials in one and two space dimensions. The present results may be used to construct a rich class of defect modes in periodic structures modulated by multiple domain walls.

preprint2019arXiv

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article, we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces. We motivate our approach through a theory of discontinuous Hamiltonian dynamics and develop a corresponding numerical solver. The proposed solver is the first of its kind, with a remarkable ability to exactly preserve the Hamiltonian. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.

preprint2019arXiv

Fisher information regularization schemes for Wasserstein gradient flows

We propose a variational scheme for computing Wasserstein gradient flows. The scheme builds upon the Jordan--Kinderlehrer--Otto framework with the Benamou-Brenier&#39;s dynamic formulation of the quadratic Wasserstein metric and adds a regularization by the Fisher information. This regularization can be derived in terms of energy splitting and is closely related to the Schr{ö}dinger bridge problem. It improves the convexity of the variational problem and automatically preserves the non-negativity of the solution. As a result, it allows us to apply sequential quadratic programming to solve the sub-optimization problem. We further save the computational cost by showing that no additional time interpolation is needed in the underlying dynamic formulation of the Wasserstein-2 metric, and therefore, the dimension of the problem is vastly reduced. Several numerical examples, including porous media equation, nonlinear Fokker-Planck equation, aggregation diffusion equation, and Derrida-Lebowitz-Speer-Spohn equation, are provided. These examples demonstrate the simplicity and stableness of the proposed scheme.

preprint2019arXiv

Stochastic modified equations for the asynchronous stochastic gradient descent

We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME&#39;s precise prediction to the trajectories of ASGD with various forcing terms. As an application of the SME, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.

preprint2017arXiv

Bold Diagrammatic Monte Carlo in the Lens of Stochastic Iterative Methods

This work aims at understanding of bold diagrammatic Monte Carlo (BDMC) methods for stochastic summation of Feynman diagrams from the angle of stochastic iterative methods. The convergence enhancement trick of the BDMC is investigated from the analysis of condition number and convergence of the stochastic iterative methods. Numerical experiments are carried out for model systems to compare the BDMC with related stochastic iterative approaches.

preprint2016arXiv

Wavepackets in inhomogeneous periodic media: effective particle-field dynamics and Berry curvature

We consider a model of an electron in a crystal moving under the influence of an external electric field: Schrödinger&#39;s equation with a potential which is the sum of a periodic function and a general smooth function. We identify two dimensionless parameters: (re-scaled) Planck&#39;s constant and the ratio of the lattice spacing to the scale of variation of the external potential. We consider the special case where both parameters are equal and denote this parameter $ε$. In the limit $ε\downarrow 0$, we prove the existence of solutions known as semiclassical wavepackets which are asymptotic up to `Ehrenfest time&#39; $t \sim \ln 1/ε$. To leading order, the center of mass and average quasi-momentum of these solutions evolve along trajectories generated by the classical Hamiltonian given by the sum of the Bloch band energy and the external potential. We then derive all corrections to the evolution of these observables proportional to $ε$. The corrections depend on the gauge-invariant Berry curvature of the Bloch band, and a coupling to the evolution of the wave-packet envelope which satisfies Schrödinger&#39;s equation with a time-dependent harmonic oscillator Hamiltonian. This infinite dimensional coupled `particle-field&#39; system may be derived from an `extended&#39; $ε$-dependent Hamiltonian. It is known that such coupling of observables (discrete particle-like degrees of freedom) to the wave-envelope (continuum field-like degrees of freedom) can have a significant impact on the overall dynamics.