Source author record

Rongjie Lai

Rongjie Lai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Machine Learning math.OC math-ph math.MP Numerical Analysis physics.comp-ph Computational Geometry Computer Vision Artificial Intelligence Biomolecules cond-mat.mtrl-sci math.AP math.DG math.SP quant-ph

Catalog footprint

What is connected

19works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

In-Context Operator Learning on the Space of Probability Measures

We introduce \emph{in-context operator learning on probability measure spaces} for optimal transport (OT). The goal is to learn a single solution operator that maps a pair of distributions to the OT map, using only few-shot samples from each distribution as a prompt and \emph{without} gradient updates at inference. We parameterize the solution operator and develop scaling-law theory in two regimes. In the \emph{nonparametric} setting, when tasks concentrate on a low-intrinsic-dimension manifold of source--target pairs, we establish generalization bounds that quantify how in-context accuracy scales with prompt size, intrinsic task dimension, and model capacity. In the \emph{parametric} setting (e.g., Gaussian families), we give an explicit architecture that recovers the exact OT map in context and provide finite-sample excess-risk bounds. Our numerical experiments on synthetic transports and generative-modeling benchmarks validate the framework.

preprint2026arXiv

Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow

The Jordan-Kinderlehrer-Otto (JKO) scheme provides a stable variational framework for computing Wasserstein gradient flows, but its practical use is often limited by the high computational cost of repeatedly solving the JKO subproblems. We propose a self-supervised approach for learning a JKO solution operator without requiring numerical solutions of any JKO trajectories. The learned operator maps an input density directly to the minimizer of the corresponding JKO subproblem, and can be iteratively applied to efficiently generate the gradient-flow evolution. A key challenge is that only a number of initial densities are typically available for training. To address this, we introduce a Learn-to-Evolve algorithm that jointly learns the JKO operator and its induced trajectories by alternating between trajectory generation and operator updates. As training progresses, the generated data increasingly approximates true JKO trajectories. Meanwhile, this Learn-to-Evolve strategy serves as a natural form of data augmentation, significantly enhancing the generalization ability of the learned operator. Numerical experiments demonstrate the accuracy, stability, and robustness of the proposed method across various choices of energies and initial conditions.

preprint2026arXiv

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

preprint2025arXiv

Self-Supervised Amortized Neural Operators for Optimal Control: Scaling Laws and Applications

Optimal control provides a principled framework for transforming dynamical system models into intelligent decision-making, yet classical computational approaches are often too expensive for real-time deployment in dynamic or uncertain environments. In this work, we propose a method based on self-supervised neural operators for open-loop optimal control problems. It offers a new paradigm by directly approximating the mapping from system conditions to optimal control strategies, enabling instantaneous inference across diverse scenarios once trained. We further extend this framework to more complex settings, including dynamic or partially observed environments, by integrating the learned solution operator with Model Predictive Control (MPC). This yields a solution-operator learning method for closed-loop control, in which the learned operator supplies rapid predictions that replace the potentially time-consuming optimization step in conventional MPC. This acceleration comes with a quantifiable price to pay. Theoretically, we derive scaling laws that relate generalization error and sample/model complexity to the intrinsic dimension of the problem and the regularity of the optimal control function. Numerically, case studies show efficient, accurate real-time performance in low-intrinsic-dimension regimes, while accuracy degrades as problem complexity increases. Together, these results provide a balanced perspective: neural operators are a powerful novel tool for high-performance control when hidden low-dimensional structure can be exploited, yet they remain fundamentally constrained by the intrinsic dimensional complexity in more challenging settings.

preprint2022arXiv

Learning Geometrically Disentangled Representations of Protein Folding Simulations

Massive molecular simulations of drug-target proteins have been used as a tool to understand disease mechanism and develop therapeutics. This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein, e.g. SARS-CoV-2 Spike protein, obtained from computationally expensive molecular simulations. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules, as well as efficient generation of protein conformations that can serve as an complement of a molecular simulation engine. Specifically, we present a geometric autoencoder framework to learn separate latent space encodings of the intrinsic and extrinsic geometries of the protein structure. For this purpose, the proposed Protein Geometric AutoEncoder (ProGAE) model is trained on the protein contact map and the orientation of the backbone bonds of the protein. Using ProGAE latent embeddings, we reconstruct and generate the conformational ensemble of a protein at or near the experimental resolution, while gaining better interpretability and controllability in term of protein structure generation from the learned latent space. Additionally, ProGAE models are transferable to a different state of the same protein or to a new protein of different size, where only the dense layer decoding from the latent representation needs to be retrained. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations, charting the path toward scalable and improved approaches for analyzing and enhancing high-cost simulations of drug-target proteins.

preprint2022arXiv

Quasi-Equivalence of Width and Depth of Neural Networks

While classic studies proved that wide networks allow universal approximation, recent research and successes of deep learning demonstrate the power of deep networks. Based on a symmetric consideration, we investigate if the design of artificial neural networks should have a directional preference, and what the mechanism of interaction is between the width and depth of a network. Inspired by the De Morgan law, we address this fundamental question by establishing a quasi-equivalence between the width and depth of ReLU networks in two aspects. First, we formulate two transforms for mapping an arbitrary ReLU network to a wide network and a deep network respectively for either regression or classification so that the essentially same capability of the original network can be implemented. Then, we replace the mainstream artificial neuron type with a quadratic counterpart, and utilize the factorization and continued fraction representations of the same polynomial function to construct a wide network and a deep network, respectively. Based on our findings, a deep network has a wide equivalent, and vice versa, subject to an arbitrarily small error.

preprint2021arXiv

A Fast Proximal Gradient Method and Convergence Analysis for Dynamic Mean Field Planning

In this paper, we propose an efficient and flexible algorithm to solve dynamic mean-field planning problems based on an accelerated proximal gradient method. Besides an easy-to-implement gradient descent step in this algorithm, a crucial projection step becomes solving an elliptic equation whose solution can be obtained by conventional methods efficiently. By induction on iterations used in the algorithm, we theoretically show that the proposed discrete solution converges to the underlying continuous solution as the grid size increases. Furthermore, we generalize our algorithm to mean-field game problems and accelerate it using multilevel and multigrid strategies. We conduct comprehensive numerical experiments to confirm the convergence analysis of the proposed algorithm, to show its efficiency and mass preservation property by comparing it with state-of-the-art methods, and to illustrates its flexibility for handling various mean-field variational problems.

preprint2021arXiv

NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds

Convolution plays a crucial role in various applications in signal and image processing, analysis, and recognition. It is also the main building block of convolution neural networks (CNNs). Designing appropriate convolution neural networks on manifold-structured point clouds can inherit and empower recent advances of CNNs to analyzing and processing point cloud data. However, one of the major challenges is to define a proper way to "sweep" filters through the point cloud as a natural generalization of the planar convolution and to reflect the point cloud's geometry at the same time. In this paper, we consider generalizing convolution by adapting parallel transport on the point cloud. Inspired by a triangulated surface-based method [Stefan C. Schonsheck, Bin Dong, and Rongjie Lai, arXiv:1805.07857.], we propose the Narrow-Band Parallel Transport Convolution (NPTC) using a specifically defined connection on a voxel-based narrow-band approximation of point cloud data. With that, we further propose a deep convolutional neural network based on NPTC (called NPTC-net) for point cloud classification and segmentation. Comprehensive experiments show that the proposed NPTC-net achieves similar or better results than current state-of-the-art methods on point cloud classification and segmentation.

preprint2020arXiv

Chart Auto-Encoders for Manifold Structured Data

Deep generative models have made tremendous advances in image and signal representation learning and generation. These models employ the full Euclidean space or a bounded subset as the latent space, whose flat geometry, however, is often too simplistic to meaningfully reflect the manifold structure of the data. In this work, we advocate the use of a multi-chart latent space for better data representation. Inspired by differential geometry, we propose a \textbf{Chart Auto-Encoder (CAE)} and prove a universal approximation theorem on its representation capability. We show that the training data size and the network size scale exponentially in approximation error with an exponent depending on the intrinsic dimension of the data manifold. CAE admits desirable manifold properties that auto-encoders with a flat latent space fail to obey, predominantly proximity of data. We conduct extensive experimentation with synthetic and real-life examples to demonstrate that CAE provides reconstruction with high fidelity, preserves proximity in the latent space, and generates new data remaining near the manifold. These experiments show that CAE is advantageous over existing auto-encoders and variants by preserving the topology of the data manifold as well as its geometry.

preprint2020arXiv

Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment

In this work, we introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment through sparsity control for shape correspondence between two approximate isometric surfaces. Our pairwise descriptor is based on the stiffness and mass matrix of finite element approximation of the Laplace-Beltrami differential operator, which is local in space, sparse to represent, and extremely easy to compute while containing global information. It allows us to deal with open surfaces, partial matching, and topological perturbations robustly. To solve the resulting quadratic assignment problem efficiently, the two key ideas of our iterative algorithm are: 1) select pairs with good (approximate) correspondence as anchor points, 2) solve a regularized quadratic assignment problem only in the neighborhood of selected anchor points through sparsity control. These two ingredients can improve and increase the number of anchor points quickly while reducing the computation cost in each quadratic assignment iteration significantly. With enough high-quality anchor points, one may use various pointwise global features with reference to these anchor points to further improve the dense shape correspondence. We use various experiments to show the efficiency, quality, and versatility of our method on large data sets, patches, and point clouds (without global meshes).

preprint2018arXiv

Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator

For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs relies on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree $Ω$(poly(1/$ε$)) to approximate a jump signal such as $|x|$, while rational function only needs $\mathcal{O}$(poly log(1/$ε$))\cite{liang2016deep,telgarsky2017neural}. However, it's non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.

preprint2017arXiv

Triangulated Surface Denoising using High Order Regularization with Dynamic Weights

Recovering high quality surfaces from noisy triangulated surfaces is a fundamental important problem in geometry processing. Sharp features including edges and corners can not be well preserved in most existing denoising methods except the recent total variation (TV) and $\ell_0$ regularization methods. However, these two methods have suffered producing staircase artifacts in smooth regions. In this paper, we first introduce a second order regularization method for restoring a surface normal vector field, and then propose a new vertex updating scheme to recover the desired surface according to the restored surface normal field. The proposed model can preserve sharp features and simultaneously suppress the staircase effects in smooth regions which overcomes the drawback of the first order models. In addition, the new vertex updating scheme can prevent ambiguities introduced in existing vertex updating methods. Numerically, the proposed high order model is solved by the augmented Lagrangian method with a dynamic weighting strategy. Intensive numerical experiments on a variety of surfaces demonstrate the superiority of our method by visually and quantitatively.

preprint2016arXiv

Maximization of Laplace-Beltrami eigenvalues on closed Riemannian surfaces

Let $(M,g)$ be a connected, closed, orientable Riemannian surface and denote by $λ_k(M,g)$ the $k$-th eigenvalue of the Laplace-Beltrami operator on $(M,g)$. In this paper, we consider the mapping $(M, g)\mapsto λ_k(M,g)$. We propose a computational method for finding the conformal spectrum $Λ^c_k(M,[g_0])$, which is defined by the eigenvalue optimization problem of maximizing $λ_k(M,g)$ for $k$ fixed as $g$ varies within a conformal class $[g_0]$ of fixed volume $textrm{vol}(M,g) = 1$. We also propose a computational method for the problem where $M$ is additionally allowed to vary over surfaces with fixed genus, $γ$. This is known as the topological spectrum for genus $γ$ and denoted by $Λ^t_k(γ)$. Our computations support a conjecture of N. Nadirashvili (2002) that $Λ^t_k(0) = 8 πk$, attained by a sequence of surfaces degenerating to a union of $k$ identical round spheres. Furthermore, based on our computations, we conjecture that $Λ^t_k(1) = \frac{8π^2}{\sqrt{3}} + 8π(k-1)$, attained by a sequence of surfaces degenerating into a union of an equilateral flat torus and $k-1$ identical round spheres. The values are compared to several surfaces where the Laplace-Beltrami eigenvalues are well-known, including spheres, flat tori, and embedded tori. In particular, we show that among flat tori of volume one, the $k$-th Laplace-Beltrami eigenvalue has a local maximum with value $λ_k = 4π^2 \left\lceil \frac{k}{2} \right\rceil^2 \left( \left\lceil \frac{k}{2} \right\rceil^2 - \frac{1}{4}\right)^{-\frac{1}{2}}$. Several properties are also studied computationally, including uniqueness, symmetry, and eigenvalue multiplicity.

preprint2015arXiv

Localized density matrix minimization and linear scaling algorithms

We propose a convex variational approach to compute localized density matrices for both zero temperature and finite temperature cases, by adding an entry-wise $\ell_1$ regularization to the free energy of the quantum system. Based on the fact that the density matrix decays exponential away from the diagonal for insulating system or system at finite temperature, the proposed $\ell_1$ regularized variational method provides a nice way to approximate the original quantum system. We provide theoretical analysis of the approximation behavior and also design convergence guaranteed numerical algorithms based on Bregman iteration. More importantly, the $\ell_1$ regularized system naturally leads to localized density matrices with banded structure, which enables us to develop approximating algorithms to find the localized density matrices with computation cost linearly dependent on the problem size.

preprint2014arXiv

Compressed Wannier modes found from an $L_1$ regularized energy functional

We propose a method for calculating Wannier functions of periodic solids directly from a modified variational principle for the energy, subject to the requirement that the Wannier functions are orthogonal to all their translations ("shift-orthogonality"). Localization is achieved by adding an $L_1$ regularization term to the energy functional. This approach results in "compressed" Wannier modes with compact support, where one parameter $μ$ controls the trade-off between the accuracy of the total energy and the size of the support of the Wannier modes. Efficient algorithms for shift-orthogonalization and solution of the variational minimization problem are demonstrated.

preprint2014arXiv

Density matrix minimization with $\ell_1$ regularization

We propose a convex variational principle to find sparse representation of low-lying eigenspace of symmetric matrices. In the context of electronic structure calculation, this corresponds to a sparse density matrix minimization algorithm with $\ell_1$ regularization. The minimization problem can be efficiently solved by a split Bergman iteration type algorithm. We further prove that from any initial condition, the algorithm converges to a minimizer of the variational principle.

preprint2014arXiv

Multi-scale Non-Rigid Point Cloud Registration Using Robust Sliced-Wasserstein Distance via Laplace-Beltrami Eigenmap

In this work, we propose computational models and algorithms for point cloud registration with non-rigid transformation. First, point clouds sampled from manifolds originally embedded in some Euclidean space $\mathbb{R}^D$ are transformed to new point clouds embedded in $\mathbb{R}^n$ by Laplace-Beltrami(LB) eigenmap using the $n$ leading eigenvalues and corresponding eigenfunctions of LB operator defined intrinsically on the manifolds. The LB eigenmap are invariant under isometric transformation of the original manifolds. Then we design computational models and algorithms for registration of the transformed point clouds in distribution/probability form based on the optimal transport theory which provides both generality and flexibility to handle general point clouds setting. Our methods use robust sliced-Wasserstein distance, which is as the average of projected Wasserstein distance along different directions, and incorporate a rigid transformation to handle ambiguities introduced by the Laplace-Beltrami eigenmap. By going from smaller $n$, which provides a quick and robust registration (based on coarse scale features) as well as a good initial guess for finer scale registration, to a larger $n$, our method provides an efficient, robust and accurate approach for multi-scale non-rigid point cloud registration.

preprint2014arXiv

Projection to the Set of Shift Orthogonal Functions

This paper presents a fast algorithm for projecting a given function to the set of shift orthogonal functions (i.e. set containing functions with unit $L^2$ norm that are orthogonal to their prescribed shifts). The algorithm can be parallelized easily and its computational complexity is bounded by $O(M\log(M))$, where $M$ is the number of coefficients used for storing the input. To derive the algorithm, a particular class of basis called Shift Orthogonal Basis Functions are introduced and some theory regarding them is developed.

preprint2013arXiv

Compressed Modes for Variational Problems in Mathematics and Physics

This paper describes a general formalism for obtaining localized solutions to a class of problems in mathematical physics, which can be recast as variational optimization problems. This class includes the important cases of Schrödinger's equation in quantum mechanics and electromagnetic equations for light propagation in photonic crystals. These ideas can also be applied to develop a spatially localized basis that spans the eigenspace of a differential operator, for instance, the Laplace operator, generalizing the concept of plane waves to an orthogonal real-space basis with multi-resolution capabilities.

Rongjie Lai

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

In-Context Operator Learning on the Space of Probability Measures

Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Self-Supervised Amortized Neural Operators for Optimal Control: Scaling Laws and Applications

Learning Geometrically Disentangled Representations of Protein Folding Simulations

Quasi-Equivalence of Width and Depth of Neural Networks

A Fast Proximal Gradient Method and Convergence Analysis for Dynamic Mean Field Planning

NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds

Chart Auto-Encoders for Manifold Structured Data

Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment

Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator

Triangulated Surface Denoising using High Order Regularization with Dynamic Weights

Maximization of Laplace-Beltrami eigenvalues on closed Riemannian surfaces

Localized density matrix minimization and linear scaling algorithms

Compressed Wannier modes found from an $L_1$ regularized energy functional

Density matrix minimization with $\ell_1$ regularization

Multi-scale Non-Rigid Point Cloud Registration Using Robust Sliced-Wasserstein Distance via Laplace-Beltrami Eigenmap

Projection to the Set of Shift Orthogonal Functions

Compressed Modes for Variational Problems in Mathematics and Physics