Researcher profile

Rongjie Lai

Rongjie Lai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

In-Context Operator Learning on the Space of Probability Measures

We introduce \emph{in-context operator learning on probability measure spaces} for optimal transport (OT). The goal is to learn a single solution operator that maps a pair of distributions to the OT map, using only few-shot samples from each distribution as a prompt and \emph{without} gradient updates at inference. We parameterize the solution operator and develop scaling-law theory in two regimes. In the \emph{nonparametric} setting, when tasks concentrate on a low-intrinsic-dimension manifold of source--target pairs, we establish generalization bounds that quantify how in-context accuracy scales with prompt size, intrinsic task dimension, and model capacity. In the \emph{parametric} setting (e.g., Gaussian families), we give an explicit architecture that recovers the exact OT map in context and provide finite-sample excess-risk bounds. Our numerical experiments on synthetic transports and generative-modeling benchmarks validate the framework.

preprint2026arXiv

Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow

The Jordan-Kinderlehrer-Otto (JKO) scheme provides a stable variational framework for computing Wasserstein gradient flows, but its practical use is often limited by the high computational cost of repeatedly solving the JKO subproblems. We propose a self-supervised approach for learning a JKO solution operator without requiring numerical solutions of any JKO trajectories. The learned operator maps an input density directly to the minimizer of the corresponding JKO subproblem, and can be iteratively applied to efficiently generate the gradient-flow evolution. A key challenge is that only a number of initial densities are typically available for training. To address this, we introduce a Learn-to-Evolve algorithm that jointly learns the JKO operator and its induced trajectories by alternating between trajectory generation and operator updates. As training progresses, the generated data increasingly approximates true JKO trajectories. Meanwhile, this Learn-to-Evolve strategy serves as a natural form of data augmentation, significantly enhancing the generalization ability of the learned operator. Numerical experiments demonstrate the accuracy, stability, and robustness of the proposed method across various choices of energies and initial conditions.

preprint2026arXiv

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

preprint2025arXiv

Self-Supervised Amortized Neural Operators for Optimal Control: Scaling Laws and Applications

Optimal control provides a principled framework for transforming dynamical system models into intelligent decision-making, yet classical computational approaches are often too expensive for real-time deployment in dynamic or uncertain environments. In this work, we propose a method based on self-supervised neural operators for open-loop optimal control problems. It offers a new paradigm by directly approximating the mapping from system conditions to optimal control strategies, enabling instantaneous inference across diverse scenarios once trained. We further extend this framework to more complex settings, including dynamic or partially observed environments, by integrating the learned solution operator with Model Predictive Control (MPC). This yields a solution-operator learning method for closed-loop control, in which the learned operator supplies rapid predictions that replace the potentially time-consuming optimization step in conventional MPC. This acceleration comes with a quantifiable price to pay. Theoretically, we derive scaling laws that relate generalization error and sample/model complexity to the intrinsic dimension of the problem and the regularity of the optimal control function. Numerically, case studies show efficient, accurate real-time performance in low-intrinsic-dimension regimes, while accuracy degrades as problem complexity increases. Together, these results provide a balanced perspective: neural operators are a powerful novel tool for high-performance control when hidden low-dimensional structure can be exploited, yet they remain fundamentally constrained by the intrinsic dimensional complexity in more challenging settings.

preprint2022arXiv

Learning Geometrically Disentangled Representations of Protein Folding Simulations

Massive molecular simulations of drug-target proteins have been used as a tool to understand disease mechanism and develop therapeutics. This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein, e.g. SARS-CoV-2 Spike protein, obtained from computationally expensive molecular simulations. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules, as well as efficient generation of protein conformations that can serve as an complement of a molecular simulation engine. Specifically, we present a geometric autoencoder framework to learn separate latent space encodings of the intrinsic and extrinsic geometries of the protein structure. For this purpose, the proposed Protein Geometric AutoEncoder (ProGAE) model is trained on the protein contact map and the orientation of the backbone bonds of the protein. Using ProGAE latent embeddings, we reconstruct and generate the conformational ensemble of a protein at or near the experimental resolution, while gaining better interpretability and controllability in term of protein structure generation from the learned latent space. Additionally, ProGAE models are transferable to a different state of the same protein or to a new protein of different size, where only the dense layer decoding from the latent representation needs to be retrained. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations, charting the path toward scalable and improved approaches for analyzing and enhancing high-cost simulations of drug-target proteins.

preprint2022arXiv

Quasi-Equivalence of Width and Depth of Neural Networks

While classic studies proved that wide networks allow universal approximation, recent research and successes of deep learning demonstrate the power of deep networks. Based on a symmetric consideration, we investigate if the design of artificial neural networks should have a directional preference, and what the mechanism of interaction is between the width and depth of a network. Inspired by the De Morgan law, we address this fundamental question by establishing a quasi-equivalence between the width and depth of ReLU networks in two aspects. First, we formulate two transforms for mapping an arbitrary ReLU network to a wide network and a deep network respectively for either regression or classification so that the essentially same capability of the original network can be implemented. Then, we replace the mainstream artificial neuron type with a quadratic counterpart, and utilize the factorization and continued fraction representations of the same polynomial function to construct a wide network and a deep network, respectively. Based on our findings, a deep network has a wide equivalent, and vice versa, subject to an arbitrarily small error.

preprint2021arXiv

A Fast Proximal Gradient Method and Convergence Analysis for Dynamic Mean Field Planning

In this paper, we propose an efficient and flexible algorithm to solve dynamic mean-field planning problems based on an accelerated proximal gradient method. Besides an easy-to-implement gradient descent step in this algorithm, a crucial projection step becomes solving an elliptic equation whose solution can be obtained by conventional methods efficiently. By induction on iterations used in the algorithm, we theoretically show that the proposed discrete solution converges to the underlying continuous solution as the grid size increases. Furthermore, we generalize our algorithm to mean-field game problems and accelerate it using multilevel and multigrid strategies. We conduct comprehensive numerical experiments to confirm the convergence analysis of the proposed algorithm, to show its efficiency and mass preservation property by comparing it with state-of-the-art methods, and to illustrates its flexibility for handling various mean-field variational problems.

preprint2021arXiv

NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds

Convolution plays a crucial role in various applications in signal and image processing, analysis, and recognition. It is also the main building block of convolution neural networks (CNNs). Designing appropriate convolution neural networks on manifold-structured point clouds can inherit and empower recent advances of CNNs to analyzing and processing point cloud data. However, one of the major challenges is to define a proper way to "sweep" filters through the point cloud as a natural generalization of the planar convolution and to reflect the point cloud's geometry at the same time. In this paper, we consider generalizing convolution by adapting parallel transport on the point cloud. Inspired by a triangulated surface-based method [Stefan C. Schonsheck, Bin Dong, and Rongjie Lai, arXiv:1805.07857.], we propose the Narrow-Band Parallel Transport Convolution (NPTC) using a specifically defined connection on a voxel-based narrow-band approximation of point cloud data. With that, we further propose a deep convolutional neural network based on NPTC (called NPTC-net) for point cloud classification and segmentation. Comprehensive experiments show that the proposed NPTC-net achieves similar or better results than current state-of-the-art methods on point cloud classification and segmentation.

preprint2020arXiv

Chart Auto-Encoders for Manifold Structured Data

Deep generative models have made tremendous advances in image and signal representation learning and generation. These models employ the full Euclidean space or a bounded subset as the latent space, whose flat geometry, however, is often too simplistic to meaningfully reflect the manifold structure of the data. In this work, we advocate the use of a multi-chart latent space for better data representation. Inspired by differential geometry, we propose a \textbf{Chart Auto-Encoder (CAE)} and prove a universal approximation theorem on its representation capability. We show that the training data size and the network size scale exponentially in approximation error with an exponent depending on the intrinsic dimension of the data manifold. CAE admits desirable manifold properties that auto-encoders with a flat latent space fail to obey, predominantly proximity of data. We conduct extensive experimentation with synthetic and real-life examples to demonstrate that CAE provides reconstruction with high fidelity, preserves proximity in the latent space, and generates new data remaining near the manifold. These experiments show that CAE is advantageous over existing auto-encoders and variants by preserving the topology of the data manifold as well as its geometry.

preprint2020arXiv

Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment

In this work, we introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment through sparsity control for shape correspondence between two approximate isometric surfaces. Our pairwise descriptor is based on the stiffness and mass matrix of finite element approximation of the Laplace-Beltrami differential operator, which is local in space, sparse to represent, and extremely easy to compute while containing global information. It allows us to deal with open surfaces, partial matching, and topological perturbations robustly. To solve the resulting quadratic assignment problem efficiently, the two key ideas of our iterative algorithm are: 1) select pairs with good (approximate) correspondence as anchor points, 2) solve a regularized quadratic assignment problem only in the neighborhood of selected anchor points through sparsity control. These two ingredients can improve and increase the number of anchor points quickly while reducing the computation cost in each quadratic assignment iteration significantly. With enough high-quality anchor points, one may use various pointwise global features with reference to these anchor points to further improve the dense shape correspondence. We use various experiments to show the efficiency, quality, and versatility of our method on large data sets, patches, and point clouds (without global meshes).

preprint2018arXiv

Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator

For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs relies on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree $Ω$(poly(1/$ε$)) to approximate a jump signal such as $|x|$, while rational function only needs $\mathcal{O}$(poly log(1/$ε$))\cite{liang2016deep,telgarsky2017neural}. However, it's non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.

preprint2017arXiv

Triangulated Surface Denoising using High Order Regularization with Dynamic Weights

Recovering high quality surfaces from noisy triangulated surfaces is a fundamental important problem in geometry processing. Sharp features including edges and corners can not be well preserved in most existing denoising methods except the recent total variation (TV) and $\ell_0$ regularization methods. However, these two methods have suffered producing staircase artifacts in smooth regions. In this paper, we first introduce a second order regularization method for restoring a surface normal vector field, and then propose a new vertex updating scheme to recover the desired surface according to the restored surface normal field. The proposed model can preserve sharp features and simultaneously suppress the staircase effects in smooth regions which overcomes the drawback of the first order models. In addition, the new vertex updating scheme can prevent ambiguities introduced in existing vertex updating methods. Numerically, the proposed high order model is solved by the augmented Lagrangian method with a dynamic weighting strategy. Intensive numerical experiments on a variety of surfaces demonstrate the superiority of our method by visually and quantitatively.