Source author record

Weinan E

Weinan E appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

46works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step-wise justifications and inhibits cross-domain links by collapsing the very pathways that establish the logical and causal connections between concepts. We introduce a scalable framework that decompresses scientific reasoning, constructing a verifiable Long Chain-of-Thought (LCoT) knowledge base and projecting it into an emergent encyclopedia, SciencePedia. Our pipeline operationalizes an endpoint-driven, reductionist strategy: a Socratic agent, guided by a curriculum of around 200 courses, generates approximately 3 million first-principles questions. To ensure high fidelity, multiple independent solver models generate LCoTs, which are then rigorously filtered by prompt sanitization and cross-model answer consensus, retaining only those with verifiable endpoints. This verified corpus powers the Brainstorm Search Engine, which performs inverse knowledge search -- retrieving diverse, first-principles derivations that culminate in a target concept. This engine, in turn, feeds the Plato synthesizer, which narrates these verified chains into coherent articles. The initial SciencePedia comprises approximately 200,000 fine-grained entries spanning mathematics, physics, chemistry, biology, engineering, and computation. In evaluations across six disciplines, Plato-synthesized articles (conditioned on retrieved LCoTs) exhibit substantially higher knowledge-point density and significantly lower factual error rates than an equally-prompted baseline without retrieval (as judged by an external LLM). Built on this verifiable LCoT knowledge base, this reasoning-centric approach enables trustworthy, cross-domain scientific synthesis at scale and establishes the foundation for an ever-expanding encyclopedia.

preprint2025arXiv

Progressive Optimal Path Sampling for Closed-Loop Optimal Control Design with Deep Neural Networks

Closed-loop optimal control design for high-dimensional nonlinear systems has been a long-standing challenge. Traditional methods, such as solving the associated Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent literature proposed a new promising approach based on supervised learning, by leveraging powerful open-loop optimal control solvers to generate training data and neural networks as efficient high-dimensional function approximators to fit the closed-loop optimal control. This approach successfully handles certain high-dimensional optimal control problems but still performs poorly on more challenging problems. One of the crucial reasons for the failure is the so-called distribution mismatch phenomenon brought by the controlled dynamics. In this paper, we investigate this phenomenon and propose the Progressive Optimal Path Sampling (POPS) method to mitigate this problem. We theoretically prove that this enhanced sampling strategy outperforms both the vanilla approach and the widely used Dataset Aggregation (DAgger) method on the classical linear-quadratic regulator by a factor proportional to the total time duration. We further numerically demonstrate that the proposed sampling strategy significantly improves the performance on tested control problems, including the optimal landing problem of a quadrotor and the optimal reaching problem of a 7 DoF manipulator.

preprint2022arXiv

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism. The optimization goal is to minimize the reduced mechanism size given the error tolerance of a group of pre-selected benchmark quantities. The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem. In order to explore high dimensional Boolean space efficiently, an iterative DNN-assisted data sampling and DNN training procedure are implemented. The results show that DNN-assistance improves sampling efficiency significantly, selecting only $10^5$ samples out of $10^{34}$ possible samples for DNN to achieve sufficient accuracy. The results demonstrate the capability of the DNN to recognize key species and reasonably predict reduced mechanism performance. The well-trained DNN guarantees the optimal reduced mechanism by solving an inverse optimization problem. By comparing ignition delay times, laminar flame speeds, temperatures in PSRs, the resulting skeletal mechanism has fewer species (45 species) but the same level of accuracy as the skeletal mechanism (56 species) obtained by the Path Flux Analysis (PFA) method. In addition, the skeletal mechanism can be further reduced to 28 species if only considering atmospheric, near-stoichiometric conditions (equivalence ratio between 0.6 and 1.2). The DeePMR provides an innovative way to perform model reduction and demonstrates the great potential of data-driven methods in the combustion area.

preprint2022arXiv

A deep potential model with long-range electrostatic interactions

Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make possible molecular simulations with the accuracy of quantum mechanical density functional theory, at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range interactions and fail to describe properties that derive from the Coulombic tail of the forces. To overcome this limitation we extend the DP model by approximating the long-range electrostatic interaction between ions (nuclei+core electrons) and valence electrons with that of distributions of spherical Gaussian charges located at ionic and electronic sites. The latter are rigorously defined in terms of the centers of the maximally localized Wannier distributions, whose dependence on the local atomic environment is modeled accurately by a deep neural network. In the deep potential long-range (DPLR) model, the electrostatic energy of the Gaussian charge system is added to short-range interactions that are represented as in the standard DP model. The resulting potential energy surface is smooth and possesses analytical forces and virial. Missing effects in the standard DP scheme are recovered, improving on accuracy and predictive power. By including long-range electrostatics, DPLR correctly extrapolates to large systems the potential energy surface learned from quantum mechanical calculations on smaller systems. We illustrate the approach with three examples, the potential energy profile of the water dimer, the free energy of interaction of a water molecule with a liquid water slab, and the phonon dispersion curves of the NaCl crystal.

preprint2022arXiv

A Machine Learning Enhanced Algorithm for the Optimal Landing Problem

We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the boundary value problem solver. The performance of the proposed method is studied using the quadrotor example, a reasonably high dimensional and strongly nonlinear system. Drastic improvement in reliability and efficiency is observed.

preprint2022arXiv

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing determine the DNN training dataset, further affect DNN prediction ability. The current work proposes using Box-Cox transformation (BCT) to preprocess the combustion data. In addition, this work compares different sampling methods with or without preprocessing, including the Monte Carlo method, manifold sampling, generative neural network method (cycle-GAN), and newly-proposed multi-scale sampling. Our results reveal that the DNN trained by the manifold data can capture the chemical kinetics in limited configurations but cannot remain robust toward perturbation, which is inevitable for the DNN coupled with the flow field. The Monte Carlo and cycle-GAN samplings can cover a wider phase space but fail to capture small-scale intermediate species, producing poor prediction results. A three-hidden-layer DNN, based on the multi-scale method without specific flame simulation data, allows predicting chemical kinetics in various scenarios and being stable during the temporal evolutions. This single DNN is readily implemented with several CFD codes and validated in various combustors, including (1). zero-dimensional autoignition, (2). one-dimensional freely propagating flame, (3). two-dimensional jet flame with triple-flame structure, and (4). three-dimensional turbulent lifted flames. The results demonstrate the satisfying accuracy and generalization ability of the pre-trained DNN. The Fortran and Python versions of DNN and example code are attached in the supplementary for reproducibility.

preprint2022arXiv

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is made either using the kernel method or the two-layer neural network model, in the context of a fitted Q-iteration algorithm with explicit regularization. We establish an $\tilde{O}(H^3|\mathcal {A}|^{\frac14}n^{-\frac14})$ bound for the optimal policy with $Hn$ samples, where $H$ is the length of each episode and $|\mathcal {A}|$ is the size of action space. Our analysis hinges on analyzing the $L^2$ error of the approximated Q-function using $n$ data points. Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.

preprint2022arXiv

Bridging Traditional and Machine Learning-based Algorithms for Solving PDEs: The Random Feature Method

One of the oldest and most studied subject in scientific computing is algorithms for solving partial differential equations (PDEs). A long list of numerical methods have been proposed and successfully used for various applications. In recent years, deep learning methods have shown their superiority for high-dimensional PDEs where traditional methods fail. However, for low dimensional problems, it remains unclear whether these methods have a real advantage over traditional algorithms as a direct solver. In this work, we propose the random feature method (RFM) for solving PDEs, a natural bridge between traditional and machine learning-based algorithms. RFM is based on a combination of well-known ideas: 1. representation of the approximate solution using random feature functions; 2. collocation method to take care of the PDE; 3. the penalty method to treat the boundary conditions, which allows us to treat the boundary condition and the PDE in the same footing. We find it crucial to add several additional components including multi-scale representation and rescaling the weights in the loss function. We demonstrate that the method exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency. In addition, we find that RFM is particularly suited for complex problems with complex geometry, where both traditional and machine learning-based algorithms encounter difficulties.

preprint2022arXiv

Empowering Optimal Control with Machine Learning: A Perspective from Model Predictive Control

Solving complex optimal control problems have confronted computational challenges for a long time. Recent advances in machine learning have provided us with new opportunities to address these challenges. This paper takes model predictive control, a popular optimal control method, as the primary example to survey recent progress that leverages machine learning techniques to empower optimal control solvers. We also discuss some of the main challenges encountered when applying machine learning to develop more robust optimal control algorithms.

preprint2021arXiv

Generalization and Memorization: The Bias Potential Model

Models for learning probability distributions such as generative models and density estimators behave quite differently from models for learning functions. One example is found in the memorization phenomenon, namely the ultimate convergence to the empirical distribution, that occurs in generative adversarial networks (GANs). For this reason, the issue of generalization is more subtle than that for supervised learning. For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted, despite that in the long term, the model either memorizes the samples or diverges.

preprint2021arXiv

MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs

In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. For linear PDEs, we use a DNN to parameterize the Green's function and obtain the neural operator to approximate the solution according to the Green's method. To train the DNN, the empirical risk consists of the mean squared loss with the least square formulation or the variational formulation of the governing equation and boundary conditions. For complicated problems, the empirical risk also includes a few labels, which are computed on coarse grid points with cheap computation cost and significantly improves the model accuracy. Intuitively, the labeled dataset works as a regularization in addition to the model constraints. The MOD-Net solves a family of PDEs rather than a specific one and is much more efficient than original neural operator because few expensive labels are required. We numerically show MOD-Net is very efficient in solving Poisson equation and one-dimensional radiative transfer equation. For nonlinear PDEs, the nonlinear MOD-Net can be similarly used as an ansatz for solving nonlinear PDEs, exemplified by solving several nonlinear PDE problems, such as the Burgers equation.

preprint2021arXiv

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the models are sufficiently over-parametrized.

preprint2020arXiv

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels. In addition, it is proved that throughout the training process the functions represented by the neural network model are uniformly close to that of a kernel method. For general values of the network width and training data size, sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space.

preprint2020arXiv

A mathematical model for universal semantics

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector, interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words. These language-independent semantic representations enable a robot reader to both understand short texts in a given language (automated question-answering) and match medium-length texts across different languages (automated word translation). Our semantic fingerprints quantify local meaning of words in 14 representative languages across 5 major language families, suggesting a universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica

preprint2020arXiv

A Priori Estimates of the Population Risk for Two-layer Neural Networks

New estimates for the population risk are established for two-layer neural networks. These estimates are nearly optimal in the sense that the error rates scale in the same way as the Monte Carlo error rates. They are equally effective in the over-parametrized regime when the network size is much larger than the size of the dataset. These new estimates are a priori in nature in the sense that the bounds depend only on some norms of the underlying functions to be fitted, not the parameters in the model, in contrast with most existing results which are a posteriori in nature. Using these a priori estimates, we provide a perspective for understanding why two-layer neural networks perform better than the related kernel methods.

preprint2020arXiv

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.

preprint2020arXiv

Coarse-grained spectral projection (CGSP): a deep learning-assisted approach to quantum unitary dynamics

We propose the coarse-grained spectral projection method (CGSP), a deep learning-assisted approach for tackling quantum unitary dynamic problems with an emphasis on quench dynamics. We show CGSP can extract spectral components of many-body quantum states systematically with sophisticated neural network quantum ansatz. CGSP exploits fully the linear unitary nature of the quantum dynamics, and is potentially superior to other quantum Monte Carlo methods for ergodic dynamics. Preliminary numerical results on 1D XXZ models with periodic boundary condition are carried out to demonstrate the practicality of CGSP.

preprint2020arXiv

Deep neural network for the dielectric response of insulators

We introduce a deep neural network to model in a symmetry preserving way the environmental dependence of the centers of the electronic charge. The model learns from ab-initio density functional theory, wherein the electronic centers are uniquely assigned by the maximally localized Wannier functions. When combined with the Deep Potential model of the atomic potential energy surface, the scheme predicts the dielectric response of insulators for trajectories inaccessible to direct ab-initio simulation. The scheme is non-perturbative and can capture the response of a mutating chemical environment. We demonstrate the approach by calculating the infrared spectra of liquid water at standard conditions, and of ice under extreme pressure, when it transforms from a molecular to an ionic crystal.

preprint2020arXiv

Ground state energy functional with Hartree-Fock efficiency and chemical accuracy

We introduce the Deep Post-Hartree-Fock (DeePHF) method, a machine learning based scheme for constructing accurate and transferable models for the ground-state energy of electronic structure problems. DeePHF predicts the energy difference between results of highly accurate models such as the coupled cluster method and low accuracy models such as the the Hartree-Fock (HF) method, using the ground-state electronic orbitals as the input. It preserves all the symmetries of the original high accuracy model. The added computational cost is less than that of the reference HF or DFT and scales linearly with respect to system size. We examine the performance of DeePHF on organic molecular systems using publicly available datasets and obtain the state-of-art performance, particularly on large datasets.

preprint2020arXiv

Integrating Machine Learning with Physics-Based Modeling

Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research. However, many issues need to be addressed before this becomes a reality. This article focuses on one particular issue of broad interest: How can we integrate machine learning with physics-based modeling to develop new interpretable and truly reliable physical models? After introducing the general guidelines, we discuss the two most important issues for developing machine learning-based physical models: Imposing physical constraints and obtaining optimal datasets. We also provide a simple and intuitive explanation for the fundamental reasons behind the success of modern machine learning, as well as an introduction to the concurrent machine learning framework needed for integrating machine learning with physics-based modeling. Molecular dynamics and moment closure of kinetic equations are used as examples to illustrate the main issues discussed. We end with a general discussion on where this integration will lead us to, and where the new frontier will be after machine learning is successfully integrated into scientific modeling.

preprint2020arXiv

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can be approximated by multi-layer neural networks with dimension-independent convergence rates. The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks. This representation allows us to define a new class of continuous models for machine learning. We show that the gradient flow defined this way is the natural continuous analog of the gradient descent dynamics for the associated multi-layer neural networks. We show that the path-norm increases at most polynomially under this continuous gradient flow dynamics.

preprint2020arXiv

Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

For 35 years, {\it ab initio} molecular dynamics (AIMD) has been the method of choice for modeling complex atomistic phenomena from first principles. However, most AIMD applications are limited by computational cost to systems with thousands of atoms at most. We report that a machine learning-based simulation protocol (Deep Potential Molecular Dynamics), while retaining {\it ab initio} accuracy, can simulate more than 1 nanosecond-long trajectory of over 100 million atoms per day, using a highly optimized code (GPU DeePMD-kit) on the Summit supercomputer. Our code can efficiently scale up to the entire Summit supercomputer, attaining $91$ PFLOPS in double precision ($45.5\%$ of the peak) and {$162$/$275$ PFLOPS in mixed-single/half precision}. The great accomplishment of this work is that it opens the door to simulating unprecedented size and time scales with {\it ab initio} accuracy. It also poses new challenges to the next-generation supercomputer for a better integration of machine learning and physical modeling.

preprint2020arXiv

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural network-like behavior is continued into the mildly over-parametrized regime, where it undergoes a transition to a random feature-like behavior. The quenching-activation process seems to provide a clear mechanism for "implicit regularization". This is qualitatively different from the dynamics associated with the "mean-field" scaling where all neurons participate equally and there does not appear to be qualitative changes when the network parameters are changed.

preprint2020arXiv

The Slow Deterioration of the Generalization Error of the Random Feature Model

The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the gradient descent algorithm in this regime. We show, both theoretically and experimentally, that there is a dynamic self-correction mechanism at work: The larger the eventual generalization gap, the slower it develops, both because of the small eigenvalues. This gives us ample time to stop the training process and obtain solutions with good generalization property.

preprint2019arXiv

DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models

In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key component in this task is the generation of datasets used in model training. In this paper, we introduce the Deep Potential GENerator (DP-GEN), an open-source software platform that implements the recently proposed "on-the-fly" learning procedure [Phys. Rev. Materials 3, 023804] and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training. DP-GEN automatically and iteratively performs three steps: exploration, labeling, and training. It supports various popular packages for these three steps: LAMMPS for exploration, Quantum Espresso, VASP, CP2K, etc. for labeling, and DeePMD-kit for training. It also allows automatic job submission and result collection on different types of machines, such as high performance clusters and cloud machines, and is adaptive to different job management tools, including Slurm, PBS, and LSF. As a concrete example, we illustrate the details of the process for generating a general-purpose PES model for Cu using DP-GEN.

preprint2018arXiv

A Mean-Field Optimal Control Formulation of Deep Learning

Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem. Mirroring the development of classical optimal control, we state and prove optimality conditions of both the Hamilton-Jacobi-Bellman type and the Pontryagin type. These mean-field results reflect the probabilistic nature of the learning problem. In addition, by appealing to the mean-field Pontryagin's maximum principle, we establish some quantitative relationships between population and empirical learning problems. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between optimal control and deep learning.

preprint2018arXiv

End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems

Machine learning models are changing the paradigm of molecular modeling, which is a fundamental tool for material science, chemistry, and computational biology. Of particular interest is the inter-atomic potential energy surface (PES). Here we develop Deep Potential - Smooth Edition (DeepPot-SE), an end-to-end machine learning-based PES model, which is able to efficiently represent the PES for a wide variety of systems with the accuracy of ab initio quantum mechanics models. By construction, DeepPot-SE is extensive and continuously differentiable, scales linearly with system size, and preserves all the natural symmetries of the system. Further, we show that DeepPot-SE describes finite and extended systems including organic molecules, metals, semiconductors, and insulators with high fidelity.

preprint2018arXiv

Solving high-dimensional partial differential equations using deep learning

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality". This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic differential equations and the gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learning with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that the proposed algorithm is quite effective in high dimensions, in terms of both accuracy and cost. This opens up new possibilities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of making ad hoc assumptions on their inter-relationships.

preprint2018arXiv

Solving Many-Electron Schrödinger Equation Using Deep Neural Networks

We introduce a new family of trial wave-functions based on deep neural networks to solve the many-electron Schrödinger equation. The Pauli exclusion principle is dealt with explicitly to ensure that the trial wave-functions are physical. The optimal trial wave-function is obtained through variational Monte Carlo and the computational cost scales quadratically with the number of electrons. The algorithm does not make use of any prior knowledge such as atomic orbitals. Yet it is able to represent accurately the ground-states of the tested systems, including He, H2, Be, B, LiH, and a chain of 10 hydrogen atoms. This opens up new possibilities for solving large-scale many-electron Schrödinger equation.

preprint2017arXiv

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives.

preprint2017arXiv

Deep Potential: a general representation of a many-body potential energy surface

We present a simple, yet general, end-to-end deep neural network representation of the potential energy surface for atomic and molecular systems. This methodology, which we call Deep Potential, is "first-principle" based, in the sense that no ad hoc approximations or empirical fitting functions are required. The neural network structure naturally respects the underlying symmetries of the systems. When tested on a wide variety of examples, Deep Potential is able to reproduce the original model, whether empirical or quantum mechanics based, within chemical accuracy. The computational cost of this new model is not substantially larger than that of empirical force fields. In addition, the method has promising scalability properties. This brings us one step closer to being able to carry out molecular simulations with accuracy comparable to that of quantum mechanics models and computational cost comparable to that of empirical potentials.

preprint2017arXiv

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio. Moreover, such PDEs are often fully nonlinear due to the need to incorporate certain nonlinear phenomena in the model such as default risks, transaction costs, volatility uncertainty (Knightian uncertainty), or trading constraints in the model. Such high-dimensional fully nonlinear PDEs are exceedingly difficult to solve as the computational effort for standard approximation methods grows exponentially with the dimension. In this work we propose a new method for solving high-dimensional fully nonlinear second-order PDEs. Our method can in particular be used to sample from high-dimensional nonlinear expectations. The method is based on (i) a connection between fully nonlinear second-order PDEs and second-order backward stochastic differential equations (2BSDEs), (ii) a merged formulation of the PDE and the 2BSDE problem, (iii) a temporal forward discretization of the 2BSDE and a spatial approximation via deep neural nets, and (iv) a stochastic gradient descent-type optimization procedure. Numerical results obtained using ${\rm T{\small ENSOR}F{\small LOW}}$ in ${\rm P{\small YTHON}}$ illustrate the efficiency and the accuracy of the method in the cases of a $100$-dimensional Black-Scholes-Barenblatt equation, a $100$-dimensional Hamilton-Jacobi-Bellman equation, and a nonlinear expectation of a $ 100 $-dimensional $ G $-Brownian motion.

preprint2016arXiv

Convolutional neural networks with low-rank regularization

Large CNNs have delivered impressive performance in various computer vision applications. But the storage and computation requirements make it problematic for deploying these models on mobile devices. Recently, tensor decompositions have been used for speeding up CNNs. In this paper, we further develop the tensor decomposition technique. We propose a new algorithm for computing the low-rank tensor decomposition for removing the redundancy in the convolution kernels. The algorithm finds the exact global optimizer of the decomposition and is more effective than iterative methods. Based on the decomposition, we further propose a new method for training low-rank constrained CNNs from scratch. Interestingly, while achieving a significant speedup, sometimes the low-rank constrained CNNs delivers significantly better performance than their non-constrained counterparts. On the CIFAR-10 dataset, the proposed low-rank NIN model achieves $91.31\%$ accuracy (without data augmentation), which also improves upon state-of-the-art result. We evaluated the proposed method on CIFAR-10 and ILSVRC12 datasets for a variety of modern CNNs, including AlexNet, NIN, VGG and GoogleNet with success. For example, the forward time of VGG-16 is reduced by half while the performance is still comparable. Empirical success suggests that low-rank tensor decompositions can be a very useful tool for speeding up large CNNs.

preprint2016arXiv

Deep Learning Approximation for Stochastic Control Problems

Many real world stochastic control problems suffer from the "curse of dimensionality". To overcome this difficulty, we develop a deep learning approach that directly solves high-dimensional stochastic control problems based on Monte-Carlo sampling. We approximate the time-dependent controls as feedforward neural networks and stack these networks together through model dynamics. The objective function for the control problem plays the role of the loss function for the deep neural network. We test this approach using examples from the areas of optimal trading and energy storage. Our results suggest that the algorithm presented here achieves satisfactory accuracy and at the same time, can handle rather high dimensional problems.

preprint2015arXiv

Functional Frank-Wolfe Boosting for General Loss Functions

Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using $l_1$ regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and Frank-Wolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis.

preprint2015arXiv

Multiscale Adaptive Representation of Signals: I. The Basic Framework

We introduce a framework for designing multi-scale, adaptive, shift-invariant frames and bi-frames for representing signals. The new framework, called AdaFrame, improves over dictionary learning-based techniques in terms of computational efficiency at inference time. It improves classical multi-scale basis such as wavelet frames in terms of coding efficiency. It provides an attractive alternative to dictionary learning-based techniques for low level signal processing tasks, such as compression and denoising, as well as high level tasks, such as feature extraction for object recognition. Connections with deep convolutional networks are also discussed. In particular, the proposed framework reveals a drawback in the commonly used approach for visualizing the activations of the intermediate layers in convolutional networks, and suggests a natural alternative.

preprint2015arXiv

Noisy Hegselmann-Krause Systems: Phase Transition and the 2R-Conjecture

The classic Hegselmann-Krause (HK) model for opinion dynam- ics consists of a set of agents on the real line, each one instructed to move, at every time step, to the mass center of all the agents within a fixed distance R. In this work, we investigate the effects of noise in the continuous-time version of the model as described by its mean-field limiting Fokker-Planck equation. In the presence of a finite number of agents, the system exhibits a phase transition from order to disorder as the noise increases. The ordered phase features clusters whose width depends only on the noise level. We introduce an order parameter to track the phase transition and resolve the corresponding phase dia- gram. The system undergoes a phase transition for small R but none for larger R. Based on the stability analysis of the mean-field equation, we derive the existence of a forbidden zone for the disordered phase to emerge. We also provide a theoretical explanation for the well-known 2R conjecture, which states that, for a random initial distribution in a fixed interval, the final configuration consists of clusters separated by a distance of roughly 2R. Our theoretical analysis also confirms previous simulations and predicts properties of the noisy HK model in higher dimension.

preprint2013arXiv

Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory

We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering step (F) which acts as a preconditioner in the framework of the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The resulting method, dubbed the LOBPCG-F method, is able to compute the desired eigenvalues and eigenvectors in the positive energy band without computing any state in the negative energy band. The LOBPCG-F method introduces mild extra cost compared to the standard LOBPCG method and can be easily implemented. We demonstrate our method in the pseudopotential framework with a planewave basis set which naturally satisfies the kinetic balance prescription. Numerical results for Pt$_{2}$, Au$_{2}$, TlF, and Bi$_{2}$Se$_{3}$ indicate that the LOBPCG-F method is a robust and efficient method for investigating the relativistic effect in systems containing heavy elements.

preprint2013arXiv

Exact renormalization group analysis of turbulent transport by the shear flow

The exact renormalization group (RG) method initiated by Wilson and further developed by Polchinski is used to study the shear flow model proposed by Avellaneda and Majda as a simplified model for the diffusive transport of a passive scalar by a turbulent velocity field. It is shown that this exact RG method is capable of recovering all the scaling regimes as the spectral parameters of velocity statistics vary, found by Avellaneda and Majda in their rigorous study of this model. This gives further confidence that the RG method, if implemented in the right way instead of using drastic truncations as in the Yakhot-Orszag's approximate RG scheme, does give the correct prediction for the large scale behaviors of solutions of stochastic partial differential equations (PDE). We also derive the analog of the "large eddy simulation" models when a finite amount of small scales are eliminated from the problem.

preprint2012arXiv

Modified string method for finding minimum energy path

We present an efficient algorithm for calculating the minimum energy path (MEP) and energy barriers between local minima on a multidimensional potential energy surface (PES). Such paths play a central role in the understanding of transition pathways between metastable states. Our method relies on the original formulation of the string method [Phys. Rev. B ${\bf 66}$, 052301 (2002)], i.e. to evolve a smooth curve along a direction normal to the curve. The algorithm works by performing minimization steps on hyperplanes normal to the curve. Therefore the problem of finding MEP on the PES is remodeled as a set of constrained minimization problems. This provides the flexibility of using minimization algorithms faster than the steepest descent method used in the simplified string method [J. Chem. Phys., ${\bf 126}$(16),164103 (2007)]. At the same time, it provides a more direct analog of the finite temperature string method. The applicability of the algorithm is demonstrated using various examples.

preprint2011arXiv

Atomistic simulations of rare events using gentlest ascent dynamics

The dynamics of complex systems often involve thermally activated barrier crossing events that allow these systems to move from one basin of attraction on the high dimensional energy surface to another. Such events are ubiquitous, but challenging to simulate using conventional simulation tools, such as molecular dynamics. Recently, Weinan E et al. [Nonlinearity, 24(6),1831(2011)] proposed a set of dynamic equations, the gentlest ascent dynamics (GAD), to describe the escape of a system from a basin of attraction and proved that solutions of GAD converge to index-1 saddle points of the underlying energy. In this paper, we extend GAD to enable finite temperature simulations in which the system hops between different saddle points on the energy surface. An effective strategy to use GAD to sample an ensemble of low barrier saddle points located in the vicinity of a locally stable configuration on the high dimensional energy surface is proposed. The utility of the method is demonstrated by studying the low barrier saddle points associated with point defect activity on a surface. This is done for two representative systems, namely, (a) a surface vacancy and ad-atom pair and (b) a heptamer island on the (111) surface of copper.

preprint2011arXiv

Cauchy-Born rule and spin density wave for the spin-polarized Thomas-Fermi-Dirac-von Weizsacker model

The electronic structure (electron charges and spins) of a perfect crystal under external magnetic field is analyzed using the spin-polarized Thomas-Fermi-Dirac-von Weizsacker model. An extension of the classical Cauchy-Born rule for crystal lattices is established for the electronic structure under sharp stability conditions on charge density wave and spin density wave. A Landau-Lifschitz type micromagnetic energy functional is derived.

preprint2011arXiv

The Gentlest Ascent Dynamics

Dynamical systems that describe the escape from the basins of attraction of stable invariant sets are presented and analyzed. It is shown that the stable fixed points of such dynamical systems are the index-1 saddle points. Generalizations to high index saddle points are discussed. Both gradient and non-gradient systems are considered. Preliminary results on the nature of the dynamical behavior are presented.

preprint2010arXiv

Effective Maxwell equations from time-dependent density functional theory

The behavior of interacting electrons in a perfect crystal under macroscopic external electric and magnetic fields is studied. Effective Maxwell equations for the macroscopic electric and magnetic fields are derived starting from time-dependent density functional theory. Effective permittivity and permeability coefficients are obtained.

preprint2008arXiv

Multipole Representation of the Fermi Operator with Application to the Electronic Structure Analysis of Metallic Systems

We propose a multipole representation of the Fermi-Dirac function and the Fermi operator, and use this representation to develop algorithms for electronic structure analysis of metallic systems. The new algorithm is quite simple and efficient. Its computational cost scales logarithmically with $βΔ\eps$ where $β$ is the inverse temperature, and $Δ\eps$ is the width of the spectrum of the discretized Hamiltonian matrix.

preprint2000arXiv

Invariant measures for Burgers equation with stochastic forcing

In this paper we study the following Burgers equation du/dt + d/dx (u^2/2) = epsilon d^2u/dx^2 + f(x,t) where f(x,t)=dF/dx(x,t) is a random forcing function, which is periodic in x and white noise in t. We prove the existence and uniqueness of an invariant measure by establishing a ``one force, one solution'' principle, namely that for almost every realization of the force, there is a unique distinguished solution that exists for the time interval (-infty, +infty) and this solution attracts all other solutions with the same forcing. This is done by studying the so-called one-sided minimizers. We also give a detailed description of the structure and regularity properties for the stationary solutions. In particular, we prove, under some non-degeneracy conditions on the forcing, that almost surely there is a unique main shock and a unique global minimizer for the stationary solutions. Furthermore the global minimizer is a hyperbolic trajectory of the underlying system of characteristics.

Weinan E

What is connected

Connect this record

See the researcher in context

Building this map preview

46 published item(s)

Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

Progressive Optimal Path Sampling for Closed-Loop Optimal Control Design with Deep Neural Networks

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

A deep potential model with long-range electrostatic interactions

A Machine Learning Enhanced Algorithm for the Optimal Landing Problem

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Bridging Traditional and Machine Learning-based Algorithms for Solving PDEs: The Random Feature Method

Empowering Optimal Control with Machine Learning: A Perspective from Model Predictive Control

Generalization and Memorization: The Bias Potential Model

MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

A mathematical model for universal semantics

A Priori Estimates of the Population Risk for Two-layer Neural Networks

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

Coarse-grained spectral projection (CGSP): a deep learning-assisted approach to quantum unitary dynamics

Deep neural network for the dielectric response of insulators

Ground state energy functional with Hartree-Fock efficiency and chemical accuracy

Integrating Machine Learning with Physics-Based Modeling

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

The Slow Deterioration of the Generalization Error of the Random Feature Model

DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models

A Mean-Field Optimal Control Formulation of Deep Learning

End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems

Solving high-dimensional partial differential equations using deep learning

Solving Many-Electron Schrödinger Equation Using Deep Neural Networks

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Deep Potential: a general representation of a many-body potential energy surface

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

Convolutional neural networks with low-rank regularization

Deep Learning Approximation for Stochastic Control Problems

Functional Frank-Wolfe Boosting for General Loss Functions

Multiscale Adaptive Representation of Signals: I. The Basic Framework

Noisy Hegselmann-Krause Systems: Phase Transition and the 2R-Conjecture

Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory

Exact renormalization group analysis of turbulent transport by the shear flow

Modified string method for finding minimum energy path

Atomistic simulations of rare events using gentlest ascent dynamics

Cauchy-Born rule and spin density wave for the spin-polarized Thomas-Fermi-Dirac-von Weizsacker model

The Gentlest Ascent Dynamics

Effective Maxwell equations from time-dependent density functional theory

Multipole Representation of the Fermi Operator with Application to the Electronic Structure Analysis of Metallic Systems

Invariant measures for Burgers equation with stochastic forcing