Researcher profile

Wuchen Li

Wuchen Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
26works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

26 published item(s)

preprint2026arXiv

A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations

We propose a scalable preconditioned primal-dual hybrid gradient algorithm for solving partial differential equations (PDEs). We multiply the PDE with a dual test function to obtain an inf-sup problem whose loss functional involves lower-order differential operators. The Primal-Dual Hybrid Gradient (PDHG) algorithm is then leveraged for this saddle point problem. By introducing suitable precondition operators to the proximal steps in the PDHG algorithm, we obtain an alternative natural gradient ascent-descent optimization scheme for updating the neural network parameters. We apply the Krylov subspace method (MINRES) to evaluate the natural gradients efficiently. Such treatment readily handles the inversion of precondition matrices via matrix-vector multiplication. An \textit{a posteriori} convergence analysis is established for the time-continuous version of the proposed algorithm for general linear PDEs. By incorporating appropriate boundary loss terms, we further obtain a refined \textit{a priori} convergence result for elliptic equations in divergence form. The algorithm is tested on various types of PDEs with dimensions ranging from $1$ to $50$, including linear and nonlinear elliptic equations, reaction-diffusion equations, and Monge-Ampère equations stemming from the $L^2$ optimal transport problems. We compare the performance of the proposed method with several commonly used deep learning algorithms such as physics-informed neural networks (PINNs), the DeepRitz method and weak adversarial networks (WANs) using either the Adam or the L-BFGS optimizer. The numerical results suggest that the proposed method performs efficiently and robustly and converges more stably with higher accuracy.

preprint2026arXiv

Accelerated Regularized Wasserstein Proximal Sampling Algorithms

We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to Nesterov acceleration. In contrast to traditional kernel density score estimation, we use the recently proposed regularized Wasserstein proximal method, yielding the Accelerated Regularized Wasserstein Proximal method (ARWP). We provide a detailed analysis of continuous- and discrete-time non-asymptotic and asymptotic mixing rates for Gaussian initial and target distributions, using techniques from Euclidean acceleration and accelerated information gradients. Compared with the kinetic Langevin sampling algorithm, the proposed algorithm exhibits a higher contraction rate in the asymptotic time regime. Numerical experiments are conducted across various low-dimensional experiments, including multi-modal Gaussian mixtures and ill-conditioned Rosenbrock distributions. ARWP exhibits structured and convergent particles, accelerated discrete-time mixing, and faster tail exploration than the non-accelerated regularized Wasserstein proximal method and kinetic Langevin methods. Additionally, ARWP particles exhibit better generalization properties for some non-log-concave Bayesian neural network tasks.

preprint2022arXiv

A primal-dual approach for solving conservation laws with implicit in time approximations

In this work, we propose a novel framework for the numerical solution of time-dependent conservation laws with implicit schemes via primal-dual hybrid gradient methods. We solve an initial value problem (IVP) for the partial differential equation (PDE) by casting it as a saddle point of a min-max problem and using iterative optimization methods to find the saddle point. Our approach is flexible with the choice of both time and spatial discretization schemes. It benefits from the implicit structure and gains large regions of stability, and overcomes the restriction on the mesh size in time by explicit schemes from Courant--Friedrichs--Lewy (CFL) conditions (really via von Neumann stability analysis). Nevertheless, it is highly parallelizable and easy-to-implement. In particular, no nonlinear inversions are required! Specifically, we illustrate our approach using the finite difference scheme and discontinuous Galerkin method for the spatial scheme; backward Euler and backward differentiation formulas for implicit discretization in time. Numerical experiments illustrate the effectiveness and robustness of the approach. In future work, we will demonstrate that our idea of replacing an initial-value evolution equation with this primal-dual hybrid gradient approach has great advantages in many other situations.

preprint2022arXiv

Accelerated Information Gradient flow

We present a framework for Nesterov's accelerated gradient flows in probability space to design efficient mean-field Markov chain Monte Carlo (MCMC) algorithms for Bayesian inverse problems. Here four examples of information metrics are considered, including Fisher-Rao metric, Wasserstein-2 metric, Kalman-Wasserstein metric and Stein metric. For both Fisher-Rao and Wasserstein-2 metrics, we prove convergence properties of accelerated gradient flows. In implementations, we propose a sampling-efficient discrete-time algorithm for Wasserstein-2, Kalman-Wasserstein and Stein accelerated gradient flows with a restart technique. We also formulate a kernel bandwidth selection method, which learns the gradient of logarithm of density from Brownian-motion samples. Numerical experiments, including Bayesian logistic regression and Bayesian neural network, show the strength of the proposed methods compared with state-of-the-art algorithms.

preprint2022arXiv

Computational Mean-field information dynamics associated with Reaction diffusion equations

We formulate and compute a class of mean-field information dynamics for reaction-diffusion equations. Given a class of nonlinear reaction-diffusion equations and entropy type Lyapunov functionals, we study their gradient flows formulations with generalized optimal transport metrics and mean-field control problems. We apply the primal-dual hybrid gradient algorithm to compute the mean-field control problems with potential energies. A byproduct of the proposed method contains a new and efficient variational scheme for solving implicit in time schemes of mean-field control problems. Several numerical examples demonstrate the solutions of mean-field control problems.

preprint2022arXiv

Controlling conservation laws II: compressible Navier-Stokes equations

We propose, study, and compute solutions to a class of optimal control problems for hyperbolic systems of conservation laws and their viscous regularization. We take barotropic compressible Navier--Stokes equations (BNS) as a canonical example. We first apply the entropy--entropy flux--metric condition for BNS. We select an entropy function and rewrite BNS to a summation of flux and metric gradient of entropy. We then develop a metric variational problem for BNS, whose critical points form a primal-dual BNS system. We design a finite difference scheme for the variational system. The numerical approximations of conservation laws are implicit in time. We solve the variational problem with an algorithm inspired by the primal-dual hybrid gradient method. This includes a new method for solving implicit time approximations for conservation laws, which seems to be unconditionally stable. Several numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.

preprint2022arXiv

Mean field control problems for vaccine distribution

With the invention of the COVID-19 vaccine, shipping and distributing are crucial in controlling the pandemic. In this paper, we build a mean-field variational problem in a spatial domain, which controls the propagation of pandemic by the optimal transportation strategy of vaccine distribution. Here we integrate the vaccine distribution into the mean-field SIR model designed in our previous paper arXiv:2006.01249. Numerical examples demonstrate that the proposed model provides practical strategies in vaccine distribution on a spatial domain.

preprint2022arXiv

Mean field information Hessian matrices on graphs

We derive mean-field information Hessian matrices on finite graphs. The "information" refers to entropy functions on the probability simplex. And the "mean-field" means nonlinear weight functions of probabilities supported on graphs. These two concepts define a mean-field optimal transport type metric. In this metric space, we first derive Hessian matrices of energies on graphs, including linear, interaction energies, entropies. We name their smallest eigenvalues as mean-field Ricci curvature bounds on graphs. We next provide examples on two-point spaces and graph products. We last present several applications of the proposed matrices. E.g., we prove discrete Costa's entropy power inequalities on a two-point space.

preprint2022arXiv

Mean field Kuramoto models on graphs

One of a classical synchronization model is the Kuramoto model. We propose both first and second order Kuramoto dynamical models on graphs using discrete optimal transport dynamics. We analyze the synchronization behaviors for some examples of Kuramoto models on graphs. We also provide a generalized Hopf-Cole transformation for discrete optimal transport systems. Focus on the two points graph, we derive analytical formulas of the Kuramoto dynamics with various potential induced from entropy functionals. Several numerical examples for the Kuramoto model on general graphs are presented.

preprint2022arXiv

Neural Parametric Fokker-Planck Equations

In this paper, we develop and analyze numerical methods for high dimensional Fokker-Planck equations by leveraging generative models from deep learning. Our starting point is a formulation of the Fokker-Planck equation as a system of ordinary differential equations (ODEs) on finite-dimensional parameter space with the parameters inherited from generative models such as normalizing flows. We call such ODEs neural parametric Fokker-Planck equations. The fact that the Fokker-Planck equation can be viewed as the $L^2$-Wasserstein gradient flow of Kullback-Leibler (KL) divergence allows us to derive the ODEs as the constrained $L^2$-Wasserstein gradient flow of KL divergence on the set of probability densities generated by neural networks. For numerical computation, we design a variational semi-implicit scheme for the time discretization of the proposed ODE. Such an algorithm is sampling-based, which can readily handle the Fokker-Planck equations in higher dimensional spaces. Moreover, we also establish bounds for the asymptotic convergence analysis of the neural parametric Fokker-Planck equation as well as the error analysis for both the continuous and discrete versions. Several numerical examples are provided to illustrate the performance of the proposed algorithms and analysis.

preprint2022arXiv

Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization

The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing. The approximation of the Wasserstein gradient with finite samples requires solving a variational problem. We study the variational problem in the family of two-layer networks with squared-ReLU activations, towards which we derive a semi-definite programming (SDP) relaxation. This SDP can be viewed as an approximation of the Wasserstein gradient in a broader function family including two-layer networks. By solving the convex SDP, we obtain the optimal approximation of the Wasserstein gradient direction in this class of functions. Numerical experiments including PDE-constrained Bayesian inference and parameter estimation in COVID-19 modeling demonstrate the effectiveness of the proposed method.

preprint2021arXiv

A Fast Proximal Gradient Method and Convergence Analysis for Dynamic Mean Field Planning

In this paper, we propose an efficient and flexible algorithm to solve dynamic mean-field planning problems based on an accelerated proximal gradient method. Besides an easy-to-implement gradient descent step in this algorithm, a crucial projection step becomes solving an elliptic equation whose solution can be obtained by conventional methods efficiently. By induction on iterations used in the algorithm, we theoretically show that the proposed discrete solution converges to the underlying continuous solution as the grid size increases. Furthermore, we generalize our algorithm to mean-field game problems and accelerate it using multilevel and multigrid strategies. We conduct comprehensive numerical experiments to confirm the convergence analysis of the proposed algorithm, to show its efficiency and mass preservation property by comparing it with state-of-the-art methods, and to illustrates its flexibility for handling various mean-field variational problems.

preprint2021arXiv

Projected Wasserstein gradient descent for high-dimensional Bayesian inference

We propose a projected Wasserstein gradient descent method (pWGD) for high-dimensional Bayesian inference problems. The underlying density function of a particle system of WGD is approximated by kernel density estimation (KDE), which faces the long-standing curse of dimensionality. We overcome this challenge by exploiting the intrinsic low-rank structure in the difference between the posterior and prior distributions. The parameters are projected into a low-dimensional subspace to alleviate the approximation error of KDE in high dimensions. We formulate a projected Wasserstein gradient flow and analyze its convergence property under mild assumptions. Several numerical experiments illustrate the accuracy, convergence, and complexity scalability of pWGD with respect to parameter dimension, sample size, and processor cores.

preprint2021arXiv

Wasserstein Proximal of GANs

We introduce a new method for training generative adversarial networks by applying the Wasserstein-2 metric proximal on the generators. The approach is based on Wasserstein information geometry. It defines a parametrization invariant natural gradient by pulling back optimal transport structures from probability space to parameter space. We obtain easy-to-implement iterative regularizers for the parameter updates of implicit deep generative models. Our experiments demonstrate that this method improves the speed and stability of training in terms of wall-clock time and Fréchet Inception Distance.

preprint2020arXiv

A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems

Mean field games (MFG) and mean field control (MFC) are critical classes of multi-agent models for efficient analysis of massive populations of interacting agents. Their areas of application span topics in economics, finance, game theory, industrial engineering, crowd motion, and more. In this paper, we provide a flexible machine learning framework for the numerical solution of potential MFG and MFC models. State-of-the-art numerical methods for solving such problems utilize spatial discretization that leads to a curse-of-dimensionality. We approximately solve high-dimensional problems by combining Lagrangian and Eulerian viewpoints and leveraging recent advances from machine learning. More precisely, we work with a Lagrangian formulation of the problem and enforce the underlying Hamilton-Jacobi-Bellman (HJB) equation that is derived from the Eulerian formulation. Finally, a tailored neural network parameterization of the MFG/MFC solution helps us avoid any spatial discretization. Our numerical results include the approximate solution of 100-dimensional instances of optimal transport and crowd motion problems on a standard work station and a validation using an Eulerian solver in two dimensions. These results open the door to much-anticipated applications of MFG and MFC models that were beyond reach with existing numerical methods.

preprint2020arXiv

A mean field game inverse problem

Mean-field games arise in various fields including economics, engineering, and machine learning. They study strategic decision making in large populations where the individuals interact via certain mean-field quantities. The ground metrics and running costs of the games are of essential importance but are often unknown or only partially known. In this paper, we propose mean-field game inverse-problem models to reconstruct the ground metrics and interaction kernels in the running costs. The observations are the macro motions, to be specific, the density distribution, and the velocity field of the agents. They can be corrupted by noise to some extent. Our models are PDE constrained optimization problems, which are solvable by first-order primal-dual methods. Besides, we apply Bregman iterations to find the optimal model parameters. We numerically demonstrate that our model is both efficient and robust to noise.

preprint2020arXiv

Computational methods for nonlocal mean field games with applications

We introduce a novel framework to model and solve mean-field game systems with nonlocal interactions. Our approach relies on kernel-based representations of mean-field interactions and feature-space expansions in the spirit of kernel methods in machine learning. We demonstrate the flexibility of our approach by modeling various interaction scenarios between agents. Additionally, our method yields a computationally efficient saddle-point reformulation of the original problem that is amenable to state-of-the-art convex optimization methods such as the primal-dual hybrid gradient method (PDHG). We also discuss potential applications of our methods to multi-agent trajectory planning problems.

preprint2020arXiv

Information Newton's flow: second-order optimization method in probability space

We introduce a framework for Newton's flows in probability space with information metrics, named information Newton's flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. A known fact is that overdamped Langevin dynamics correspond to Wasserstein gradient flows of Kullback-Leibler (KL) divergence. Extending this fact to Wasserstein Newton's flows, we derive Newton's Langevin dynamics. We provide examples of Newton's Langevin dynamics in both one-dimensional space and Gaussian families. For the numerical implementation, we design sampling efficient variational methods in affine models and reproducing kernel Hilbert space (RKHS) to approximate Wasserstein Newton's directions. We also establish convergence results of the proposed information Newton's method with approximated directions. Several numerical examples from Bayesian sampling problems are shown to demonstrate the effectiveness of the proposed method.

preprint2020arXiv

Kernelized Wasserstein Natural Gradient

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically.

preprint2020arXiv

Quantum statistical learning via Quantum Wasserstein natural gradient

In this article, we introduce a new approach towards the statistical learning problem $\operatorname{argmin}_{ρ(θ) \in \mathcal P_θ} W_{Q}^2 (ρ_{\star},ρ(θ))$ to approximate a target quantum state $ρ_{\star}$ by a set of parametrized quantum states $ρ(θ)$ in a quantum $L^2$-Wasserstein metric. We solve this estimation problem by considering Wasserstein natural gradient flows for density operators on finite-dimensional $C^*$ algebras. For continuous parametric models of density operators, we pull back the quantum Wasserstein metric such that the parameter space becomes a Riemannian manifold with quantum Wasserstein information matrix. Using a quantum analogue of the Benamou-Brenier formula, we derive a natural gradient flow on the parameter space. We also discuss certain continuous-variable quantum states by studying the transport of the associated Wigner probability distributions.

preprint2020arXiv

Ricci curvature for parametric statistics via optimal transport

We elaborate the notion of a Ricci curvature lower bound for parametrized statistical models. Following the seminal ideas of Lott-Strum-Villani, we define this notion based on the geodesic convexity of the Kullback-Leibler divergence in a Wasserstein statistical manifold, that is, a manifold of probability distributions endowed with a Wasserstein metric tensor structure. Within these definitions, the Ricci curvature is related to both, information geometry and Wasserstein geometry. These definitions allow us to formulate bounds on the convergence rate of Wasserstein gradient flows and information functional inequalities in parameter space. We discuss examples of Ricci curvature lower bounds and convergence rates in exponential family models.

preprint2020arXiv

Transport information geometry I: Riemannian calculus on probability simplex

We formulate the Riemannian calculus of the probability set embedded with $L^2$-Wasserstein metric. This is an initial work of transport information geometry. Our investigation starts with the probability simplex (probability manifold) supported on vertices of a finite graph. The main idea is to embed the probability manifold as a submanifold of the positive measure space with a nonlinear metric tensor. Here the nonlinearity comes from the linear weighted Laplacian operator. By this viewpoint, we establish torsion-free Christoffel symbols, Levi-Civita connections, curvature tensors and volume forms in the probability manifold by Euclidean coordinates. As a consequence, the Jacobi equation, Laplace-Beltrami and Hessian operators on the probability manifold are derived. These geometric computations are also provided in the infinite-dimensional density space (density manifold) supported on a finite-dimensional manifold. In particular, an identity is given connecting the Baker-{É}mery $Γ_2$ operator (carr{é} du champ it{é}r{é}) by connecting Fisher-Rao information metric and optimal transport metric. Several examples are demonstrated.

preprint2020arXiv

Wasserstein information matrix

We study information matrices for statistical models by the $L^2$-Wasserstein metric. We call them Wasserstein information matrices (WIMs), which are analogs of classical Fisher information matrices. We introduce Wasserstein score functions and study covariance operators in statistical models. Using them, we establish Wasserstein-Cramer-Rao bounds for estimations and explore their comparisons with classical results. We next consider the asymptotic behaviors and efficiency of estimators. We derive the on-line asymptotic efficiency for Wasserstein natural gradient. Besides, we study a Poincaré efficiency for Wasserstein natural gradient of maximal likelihood estimation. Several analytical examples of WIMs are presented, including location-scale families, independent families, and rectified linear unit (ReLU) generative models.

preprint2019arXiv

Fisher information regularization schemes for Wasserstein gradient flows

We propose a variational scheme for computing Wasserstein gradient flows. The scheme builds upon the Jordan--Kinderlehrer--Otto framework with the Benamou-Brenier's dynamic formulation of the quadratic Wasserstein metric and adds a regularization by the Fisher information. This regularization can be derived in terms of energy splitting and is closely related to the Schr{ö}dinger bridge problem. It improves the convexity of the variational problem and automatically preserves the non-negativity of the solution. As a result, it allows us to apply sequential quadratic programming to solve the sub-optimization problem. We further save the computational cost by showing that no additional time interpolation is needed in the underlying dynamic formulation of the Wasserstein-2 metric, and therefore, the dimension of the problem is vastly reduced. Several numerical examples, including porous media equation, nonlinear Fokker-Planck equation, aggregation diffusion equation, and Derrida-Lebowitz-Speer-Spohn equation, are provided. These examples demonstrate the simplicity and stableness of the proposed scheme.