Researcher profile

Yulong Lu

Yulong Lu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

A Theory of Diversity for Random Matrices with Applications to In-Context Learning of Schrödinger Equations

We address the following question: given a collection $\{\mathbf{A}^{(1)}, \dots, \mathbf{A}^{(N)}\}$ of independent $d \times d$ random matrices drawn from a common distribution $\mathbb{P}$, what is the probability that the centralizer of $\{\mathbf{A}^{(1)}, \dots, \mathbf{A}^{(N)}\}$ is trivial? We provide lower bounds on this probability in terms of the sample size $N$ and the dimension $d$ for several families of random matrices which arise from the discretization of linear Schrödinger operators with random potentials. When combined with recent work on machine learning theory, our results provide guarantees on the generalization ability of transformer-based neural networks for in-context learning of Schrödinger equations.

preprint2026arXiv

In-Context Operator Learning on the Space of Probability Measures

We introduce \emph{in-context operator learning on probability measure spaces} for optimal transport (OT). The goal is to learn a single solution operator that maps a pair of distributions to the OT map, using only few-shot samples from each distribution as a prompt and \emph{without} gradient updates at inference. We parameterize the solution operator and develop scaling-law theory in two regimes. In the \emph{nonparametric} setting, when tasks concentrate on a low-intrinsic-dimension manifold of source--target pairs, we establish generalization bounds that quantify how in-context accuracy scales with prompt size, intrinsic task dimension, and model capacity. In the \emph{parametric} setting (e.g., Gaussian families), we give an explicit architecture that recovers the exact OT map in context and provide finite-sample excess-risk bounds. Our numerical experiments on synthetic transports and generative-modeling benchmarks validate the framework.

preprint2026arXiv

Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

We study the posterior contraction rate of Bayesian Physics-Informed Neural Networks (PINNs) for solving a general class of elliptic partial differential equations (PDEs). We focus on learning of the elliptic equation with a non-homogeneous Dirichlet boundary condition from independent and noisy measurements collected both inside the domain and on the boundary. Assuming that the PDE admits a strong solution in a Hölder space and using with a suitably constructed prior on the neural network weights, we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate. Furthermore, the chosen prior is rate-adaptive: the posterior contracts at an (almost) optimal rate without prior knowledge of the smoothness level of the exact solution. Our results provide statistical guarantees for uncertainty quantification of PDEs via Bayesian PINNs.

preprint2023arXiv

Two-Scale Gradient Descent Ascent Dynamics Finds Mixed Nash Equilibria of Continuous Games: A Mean-Field Perspective

Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an important and challenging problem in machine learning. A canonical algorithm to finding the MNE is the noisy gradient descent ascent method which in the infinite particle limit gives rise to the {\em Mean-Field Gradient Descent Ascent} (GDA) dynamics on the space of probability measures. In this paper, we first study the convergence of a two-scale Mean-Field GDA dynamics for finding the MNE of the entropy-regularized objective. More precisely we show that for each finite temperature (or regularization parameter), the two-scale Mean-Field GDA with a suitable {\em finite} scale ratio converges exponentially to the unique MNE without assuming the convexity or concavity of the interaction potential. The key ingredient of our proof lies in the construction of new Lyapunov functions that dissipate exponentially along the Mean-Field GDA. We further study the simulated annealing of the Mean-Field GDA dynamics. We show that with a temperature schedule that decays logarithmically in time the annealed Mean-Field GDA converges to the MNE of the original unregularized objective.

preprint2020arXiv

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically, we propose a new continuum limit of deep residual networks, which enjoys a good landscape in the sense that every local minimizer is global. This characterization enables us to derive the first global convergence result for multilayer neural networks in the mean-field regime. Furthermore, without assuming the convexity of the loss landscape, our proof relies on a zero-loss assumption at the global minimizer that can be achieved when the model shares a universal approximation property. Key to our result is the observation that a deep residual network resembles a shallow network ensemble, i.e. a two-layer network. We bound the difference between the shallow network and our ResNet model via the adjoint sensitivity method, which enables us to apply existing mean-field analyses of two-layer networks to deep networks. Furthermore, we propose several novel training schemes based on the new continuous model, including one training procedure that switches the order of the residual blocks and results in strong empirical performance on the benchmark datasets.

preprint2020arXiv

Continuum limit and preconditioned Langevin sampling of the path integral molecular dynamics

We investigate the continuum limit that the number of beads goes to infinity in the ring polymer representation of thermal averages. Studying the continuum limit of the trajectory sampling equation sheds light on possible preconditioning techniques for sampling ring polymer configurations with large number of beads. We propose two preconditioned Langevin sampling dynamics, which are shown to have improved stability and sampling accuracy. We present a careful mode analysis of the preconditioned dynamics and show their connections to the normal mode, the staging coordinate and the Matsubara mode representation for ring polymers. In the case where the potential is quadratic, we show that the continuum limit of the preconditioned mass modified Langevin dynamics converges to its equilibrium exponentially fast, which suggests that the finite-dimensional counterpart has a dimension-independent convergence rate. In addition, the preconditioning techniques can be naturally applied to the multi-level quantum systems in the nonadiabatic regime, which are compatible with various numerical approaches.

preprint2020arXiv

Quantitative Propagation of Chaos in the bimolecular chemical reaction-diffusion model

We study a stochastic system of $N$ interacting particles which models bimolecular chemical reaction-diffusion. In this model, each particle $i$ carries two attributes: the spatial location $X_t^i\in \mathbb{T}^d$, and the type $Ξ_t^i\in \{1,\cdots,n\}$. While $X_t^i$ is a standard (independent) diffusion process, the evolution of the type $Ξ_t^i$ is described by pairwise interactions between different particles under a series of chemical reactions described by a chemical reaction network. We prove that in the large particle limit the stochastic dynamics converges to a mean field limit which is described by a nonlocal reaction-diffusion partial differential equation. In particular, we obtain a quantitative propagation of chaos result for the interacting particle system. Our proof is based on the relative entropy method used recently by Jabin and Wang \cite{JW18}. The key ingredient of the relative entropy method is a large deviation estimate for a special partition function, which was proved previously by technical combinatorial estimates. We give a simple probabilistic proof based on a novel martingale argument.

preprint2019arXiv

Geometric ergodicity of Langevin dynamics with Coulomb interactions

This paper is concerned with the long time behavior of Langevin dynamics of {\em Coulomb gases} in $\mathbf{R}^d$ with $d\geq 2$, that is a second order system of Brownian particles driven by an external force and a pairwise repulsive Coulomb force. We prove that the system converges exponentially to the unique Boltzmann-Gibbs invariant measure under a weighted total variation distance. The proof relies on a novel construction of Lyapunov function for the Coulomb system.