Researcher profile

Maxim Raginsky

Maxim Raginsky contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for an unknown system are carried out without an explicit system identification step. To that end, we link the behavioral approach to two data-driven modeling methods in a vector-valued RKHS: (1) minimum-norm interpolation and (2) subspace identification.

preprint2026arXiv

Information-theoretic Limits of Learning and Estimation

Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entropy. These tools are essential for our analysis. We then introduce the learning-theoretic framework and derive upper bounds on generalization error in terms of metric entropy, Rademacher complexity, and the VC dimension, as well as mutual information and relative entropy. Finally we discuss the minimax estimation framework and establish lower bounds on minimax risk using Fano's inequality, yielding bounds in terms of relative entropy and covering and packing numbers. This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of Cover and Thomas's Elements of Information Theory, posted with permission from Wiley. It would follow the chapter posted at arXiv:2605.02989 . The table of contents of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu.

preprint2026arXiv

Spatially-Coupled Network RNA Velocities: A Control-Theoretic Perspective

RNA velocity is an important model that combines cellular spliced and unspliced RNA counts to infer dynamical properties of various regulatory functions. Despite its wide applicability and many variants used in practice, the model has not been adequately designed to directly account for both intracellular gene regulatory network interactions and spatial intercellular communications. Here, we propose a new RNA velocity approach that jointly and directly captures two new network structures: an intracellular gene regulatory network (GRN) and an intercellular interaction network that captures interactions between (neighboring) cells, with relevance to spatial transcriptomics. We theoretically analyze this two-level network system through the lens of control and consensus theory. In particular, we investigate network equilibria, stability, cellular network consensus, and optimal control approaches for targeted drug intervention.

preprint2026arXiv

Talagrand Meets Talagrand: Upper and Lower Bounds on Expected Soft Maxima of Gaussian Processes with Finite Index Sets

Analysis of extremal behavior of stochastic processes is a key ingredient in a wide variety of applications, including probability, statistical physics, theoretical computer science, and learning theory. In this paper, we consider centered Gaussian processes on finite index sets and investigate expected values of their smoothed, or ``soft,'' maxima. We obtain upper and lower bounds for these expected values using a combination of ideas from statistical physics (the Gibbs variational principle for the equilibrium free energy and replica-symmetric representations of Gibbs averages) and from probability theory (Sudakov minoration). These bounds are parametrized by an inverse temperature $β> 0$ and reduce to the usual Gaussian maximal inequalities in the zero-temperature limit $β\to \infty$. We provide an illustration of our methods in the context of the Random Energy Model, one of the simplest models of physical systems with random disorder.

preprint2022arXiv

Fitting an immersed submanifold to data via Sussmann's orbit theorem

This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields starting from a fixed initial point. The encoder supplies the times for the flows. The encoder-decoder map is obtained by empirical risk minimization, and a high-probability bound is given on the excess risk relative to the minimum expected reconstruction error over a given class of encoder-decoder maps. The proposed approach makes fundamental use of Sussmann's orbit theorem, which guarantees that the image of the reconstruction map is indeed contained in an immersed submanifold.

preprint2022arXiv

Input-to-State Stable Neural Ordinary Differential Equations with Applications to Transient Modeling of Circuits

This paper proposes a class of neural ordinary differential equations parametrized by provably input-to-state stable continuous-time recurrent neural networks. The model dynamics are defined by construction to be input-to-state stable (ISS) with respect to an ISS-Lyapunov function that is learned jointly with the dynamics. We use the proposed method to learn cheap-to-simulate behavioral models for electronic circuits that can accurately reproduce the behavior of various digital and analog circuits when simulated by a commercial circuit simulator, even when interconnected with circuit components not encountered during training. We also demonstrate the feasibility of learning ISS-preserving perturbations to the dynamics for modeling degradation effects due to circuit aging.

preprint2021arXiv

Minimum Excess Risk in Bayesian Learning

We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known. The definition of MER provides a principled way to define different notions of uncertainties in Bayesian learning, including the aleatoric uncertainty and the minimum epistemic uncertainty. Two methods for deriving upper bounds for the MER are presented. The first method, generally suitable for Bayesian learning with a parametric generative model, upper-bounds the MER by the conditional mutual information between the model parameters and the quantity being predicted given the observed data. It allows us to quantify the rate at which the MER decays to zero as more data becomes available. Under realizable models, this method also relates the MER to the richness of the generative function class, notably the VC dimension in binary classification. The second method, particularly suitable for Bayesian learning with a parametric predictive model, relates the MER to the minimum estimation error of the model parameters from data. It explicitly shows how the uncertainty in model parameter estimation translates to the MER and to the final prediction uncertainty. We also extend the definition and analysis of MER to the setting with multiple model families and the setting with nonparametric models. Along the discussions we draw some comparisons between the MER in Bayesian learning and the excess risk in frequentist learning.

preprint2020arXiv

Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. First, neural networks are used to map features to low-dimensional representations. In the second step, CI testing is performed by applying the $k$-NN conditional mutual information estimator to the learned feature maps. The mappings are designed to ensure that mapped samples both preserve information and share similar information about the target variable if and only if they are close in Euclidean distance. We show that these properties boost the performance of the $k$-NN estimator in the second step. The performance of the proposed method is evaluated on both synthetic and real data.

preprint2020arXiv

Non-signaling Approximations of Stochastic Team Problems

In this paper, we consider non-signaling approximation of finite stochastic teams. We first introduce a hierarchy of team decision rules that can be classified in an increasing order as randomized policies, quantum-correlated policies, and non-signaling policies. Then, we establish an approximation of team-optimal policies for sequential teams via extendible non-signaling policies. We prove that the distance between extendible non-signaling policies and decentralized policies is small if the extension is sufficiently large. Using this result, we establish a linear programming (LP) approximation of sequential teams. Finally, we state an open problem regarding computation of optimal value of quantum-correlated policies.

preprint2018arXiv

Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability

We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003. For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the $\varepsilon$-neighborhood of this particular optimum within a short recurrence time; (2) it enters this $\varepsilon$-neighborhood by the recurrence time and stays there until a potentially exponentially long escape time. We call this phenomenon empirical metastability. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the effective recurrence time (i.e., number of iterations multiplied by stepsize) is dimension-independent, and resembles the convergence time of continuous-time deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.

preprint2018arXiv

Minimax Statistical Learning with Wasserstein Distances

As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove generalization bounds that involve the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for transport-based domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples.