Source author record

Peyman Mohajerin Esfahani

Peyman Mohajerin Esfahani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Machine Learning eess.SY Information Theory math.IT math.DS math.ST Statistics Theory Artificial Intelligence Computation and Language Cryptography and Security quant-ph

Catalog footprint

What is connected

19works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Optimization to Control: Quasi Policy Iteration

Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we adopt the quasi-Newton method (QNM) from convex optimization to introduce a novel control algorithm coined as quasi-policy iteration (QPI). In particular, QPI is based on a novel approximation of the ``Hessian'' matrix in the policy iteration algorithm, which exploits two linear structural constraints specific to MDPs and allows for the incorporation of prior information on the transition probability kernel. While the proposed algorithm has the same computational complexity as value iteration, it exhibits an empirical convergence behavior similar to that of QNM with a low sensitivity to the discount factor.

preprint2026arXiv

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards

Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training LLMs: sample a completion, verify it, and update. In practice, however, the verifier is almost never clean--unit tests probe only limited corner cases; human and synthetic labels are imperfect; and LLM judges (e.g., RLAIF) are noisy and can be exploited--and this problem worsens on harder domains (especially coding) where tests are sparse and increasingly model-generated. We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)? To address this, we develop an analytically tractable multi-armed bandit view of RLVR dynamics, instantiated with GRPO and validated in controlled experiments. Modeling false positives and false negatives and grouping completions into recurring reasoning modes yields a replicator-style (natural-selection) flow on the probability simplex. The dynamics decouples into within-correct-mode competition and a one-dimensional evolution for the mass on incorrect modes, whose drift is determined solely by Youden's index J=TPR-FPR. This yields a sharp phase transition: when J>0, the incorrect mass is driven toward extinction (learning); when J=0, the process is neutral; and when J<0, incorrect modes amplify until they dominate (anti-learning and collapse). In the learning regime J>0, noise primarily rescales convergence time ("rate, not fate"). Experiments on verifiable programming tasks under synthetic noise reproduce the predicted J=0 boundary. Beyond noise, the framework offers a general lens for analyzing RLVR stability, convergence, and algorithmic interventions.

preprint2026arXiv

Tight Generalization Bounds for Noiseless Inverse Optimization

Inverse optimization (IO) seeks to infer the parameters of a decision-maker's objective from observed context--action data. We study noiseless IO, where demonstrations are generated by a ground-truth objective. We provide a high-probability ${O}(\frac{d}{T})$ generalization bound for the induced action set, where $d$ is the number of unknown parameters and $T$ is the size of the training dataset. We strengthen these guarantees under additional conditions that ensure uniqueness of the chosen action, bringing our IO guarantees in line with best-arm identification results in the bandit literature. We further show that the ${O}(\frac{d}{T})$ rate is tight over all consistent estimators considered here, and extend the result to both instantaneous and cumulative regret. Notably, the resulting regret lower bound matches the corresponding upper bounds in the adversarial setting, indicating that the stochastic IO setting is effectively adversarial for the class of estimators studied here. Finally, we propose a parameter-free algorithm with lower per-iteration complexity than generic solvers. Experiments validate the predicted rates and illustrate the tightness of our bounds.

preprint2023arXiv

Adaptive Composite Online Optimization: Predictions in Static and Dynamic Environments

In the past few years, Online Convex Optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this paper, we propose new step-size rules and OCO algorithms that simultaneously exploit gradient predictions, function predictions and dynamics, features particularly pertinent to control applications. The proposed algorithms enjoy static and dynamic regret bounds in terms of the dynamics of the reference action sequence, gradient prediction error, and function prediction error, which are generalizations of known regularity measures from the literature. We present results for both convex and strongly convex costs. We validate the performance of the proposed algorithms in a trajectory tracking case study, as well as portfolio optimization using real-world datasets.

preprint2022arXiv

Multimode Diagnosis for Switched Affine Systems with Noisy Measurement

We study a diagnosis scheme to reliably detect the active mode of discrete-time, switched affine systems in the presence of measurement noise and asynchronous switching. The proposed scheme consists of two parts: (i) the construction of a bank of filters, and (ii) the introduction of a residual/threshold-based diagnosis rule. We develop an exact finite optimization-based framework to numerically solve an optimal bank of filters in which the contribution of measurement noise to the residual is minimized. The design problem is safely approximated through linear matrix inequalities and thus becomes tractable. We further propose a thresholding policy along with probabilistic false-alarm guarantees to estimate the active system mode in real-time. In comparison with the existing results, the guarantees improve from a polynomial dependency in the probability of false alarm to a logarithmic form. This improvement is achieved under the additional assumption of sub-Gaussianity, which is expected in many applications. The performance of the proposed approach is validated through a numerical example and an application of the building radiant system.

preprint2022arXiv

Multiple Faults Estimation in Dynamical Systems: Tractable Design and Performance Bounds

In this article, we propose a tractable nonlinear fault isolation filter along with explicit performance bounds for a class of nonlinear dynamical systems. We consider the presence of additive and multiplicative faults, occurring simultaneously and through an identical dynamical relationship, which represents a relevant case in several application domains. The proposed filter architecture combines tools from model-based approaches in the control literature and regression techniques from machine learning. To this end, we view the regression operator through a system-theoretic perspective to develop operator bounds that are then utilized to derive performance bounds for the proposed estimation filter. In the case of constant, simultaneously and identically acting additive and multiplicative faults, it can be shown that the estimation error converges to zero with an exponential rate. The performance of the proposed estimation filter in the presence of incipient faults is validated through an application on the lateral safety systems of SAE level 4 automated vehicles. The numerical results show that the theoretical bounds of this study are indeed close to the actual estimation error.

preprint2022arXiv

The Nonconvex Geometry of Linear Inverse Problems

The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the context of linear inverse problems, we pinpoint the source of its success, but also argue that the applicability of the gauge function is inherently limited by its convexity, and showcase several learning problems where the classical gauge function theory fails. We then introduce a new notion of statistical complexity, gauge$_p$ function, which overcomes the limitations of the gauge function. The gauge$_p$ function is a simple generalization of the gauge function that can tightly control the sparsity of a statistical model within the learning alphabet and, perhaps surprisingly, draws further inspiration from the Burer-Monteiro factorization in computational mathematics. We also propose a new learning machine, with the building block of gauge$_p$ function, and arm this machine with a number of statistical guarantees. The potential of the proposed gauge$_p$ function theory is then studied for two stylized applications. Finally, we discuss the computational aspects and, in particular, suggest a tractable numerical algorithm for implementing the new learning machine.

preprint2021arXiv

Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator -- that is, a measurable function of the observation -- and a fictitious adversary choosing a prior -- that is, a pair of signal and noise distributions ranging over independent Wasserstein balls -- with the goal to minimize and maximize the expected squared estimation error, respectively. We show that if the Wasserstein balls are centered at normal distributions, then the zero-sum game admits a Nash equilibrium, where the players' optimal strategies are given by an {\em affine} estimator and a {\em normal} prior, respectively. We further prove that this Nash equilibrium can be computed by solving a tractable convex program. Finally, we develop a Frank-Wolfe algorithm that can solve this convex program orders of magnitude faster than state-of-the-art general purpose solvers. We show that this algorithm enjoys a linear convergence rate and that its direction-finding subproblems can be solved in quasi-closed form.

preprint2020arXiv

Learning robust control for LQR systems with multiplicative noise via policy gradient

The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

preprint2020arXiv

Robust Control Design for Linear Systems via Multiplicative Noise

Robust stability and stochastic stability have separately seen intense study in control theory for many decades. In this work we establish relations between these properties for discrete-time systems and employ them for robust control design. Specifically, we examine a multiplicative noise framework which models the inherent uncertainty and variation in the system dynamics which arise in model-based learning control methods such as adaptive control and reinforcement learning. We provide results which guarantee robustness margins in terms of perturbations on the nominal dynamics as well as algorithms which generate maximally robust controllers.

preprint2020arXiv

Security Versus Privacy

Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the responses. The measure of the security, capturing the ease of detecting the presence of the false data injection, is the sensitivity of the Kullback-Leiber divergence to the additive bias. An optimization problem for balancing privacy and security is proposed and subsequently solved. It is shown that the level of guaranteed privacy times the level of security equals a constant. Therefore, by increasing the level of privacy, the security guarantees can only be weakened and vice versa. Similar results are developed under the differential privacy framework.

preprint2016arXiv

A Tractable Fault Detection and Isolation Approach for Nonlinear Systems with Probabilistic Performance

This article presents a novel perspective along with a scalable methodology to design a fault detection and isolation (FDI) filter for high dimensional nonlinear systems. Previous approaches on FDI problems are either confined to linear systems or they are only applicable to low dimensional dynamics with specific structures. In contrast, shifting attention from the system dynamics to the disturbance inputs, we propose a relaxed design perspective to train a linear residual generator given some statistical information about the disturbance patterns. That is, we propose an optimization-based approach to robustify the filter with respect to finitely many signatures of the nonlinearity. We then invoke recent results in randomized optimization to provide theoretical guarantees for the performance of the proposed filer. Finally, motivated by a cyber-physical attack emanating from the vulnerabilities introduced by the interaction between IT infrastructure and power system, we deploy the developed theoretical results to detect such an intrusion before the functionality of the power system is disrupted.

preprint2016arXiv

Approximations of Stochastic Hybrid Systems: A Compositional Approach

In this paper we propose a compositional framework for the construction of approximations of the interconnection of a class of stochastic hybrid systems. As special cases, this class of systems includes both jump linear stochastic systems and linear stochastic hybrid automata. In the proposed framework, an approximation is itself a stochastic hybrid system, which can be used as a replacement of the original stochastic hybrid system in a controller design process. We employ a notion of so-called stochastic simulation function to quantify the error between the approximation and the original system. In the first part of the paper, we derive sufficient conditions which facilitate the compositional quantification of the error between the interconnection of stochastic hybrid subsystems and that of their approximations using the quantified error between the stochastic hybrid subsystems and their corresponding approximations. In particular, we show how to construct stochastic simulation functions for approximations of interconnected stochastic hybrid systems using the stochastic simulation function for the approximation of each component. In the second part of the paper, we focus on a specific class of stochastic hybrid systems, namely, jump linear stochastic systems, and propose a constructive scheme to determine approximations together with their stochastic simulation functions for this class of systems. Finally, we illustrate the effectiveness of the proposed results by constructing an approximation of the interconnection of four jump linear stochastic subsystems in a compositional way.

preprint2015arXiv

Distributionally Robust Logistic Regression

This paper proposes a distributionally robust approach to logistic regression. We use the Wasserstein distance to construct a ball in the space of probability distributions centered at the uniform distribution on the training samples. If the radius of this ball is chosen judiciously, we can guarantee that it contains the unknown data-generating distribution with high confidence. We then formulate a distributionally robust logistic regression model that minimizes a worst-case expected logloss function, where the worst case is taken over all distributions in the Wasserstein ball. We prove that this optimization problem admits a tractable reformulation and encapsulates the classical as well as the popular regularized logistic regression problems as special cases. We further propose a distributionally robust approach based on Wasserstein balls to compute upper and lower confidence bounds on the misclassification probability of the resulting classifier. These bounds are given by the optimal values of two highly tractable linear programs. We validate our theoretical out-of-sample guarantees through simulated and empirical experiments.

preprint2015arXiv

Efficient Approximation of Channel Capacities

We propose an iterative method for approximately computing the capacity of discrete memoryless channels, possibly under additional constraints on the input distribution. Based on duality of convex programming, we derive explicit upper and lower bounds for the capacity. The presented method requires $O(M^2 N \sqrt{\log N}/\varepsilon)$ to provide an estimate of the capacity to within $\varepsilon$, where $N$ and $M$ denote the input and output alphabet size; a single iteration has a complexity $O(M N)$. We also show how to approximately compute the capacity of memoryless channels having a bounded continuous input alphabet and a countable output alphabet under some mild assumptions on the decay rate of the channel's tail. It is shown that discrete-time Poisson channels fall into this problem class. As an example, we compute sharp upper and lower bounds for the capacity of a discrete-time Poisson channel with a peak-power input constraint.

preprint2014arXiv

A Decomposition Method for Large Scale MILPs, with Performance Guarantees and a Power System Application

Lagrangian duality in mixed integer optimization is a useful framework for problems decomposition and for producing tight lower bounds to the optimal objective, but in contrast to the convex counterpart, it is generally unable to produce optimal solutions directly. In fact, solutions recovered from the dual may be not only suboptimal, but even infeasible. In this paper we concentrate on large scale mixed--integer programs with a specific structure that is of practical interest, as it appears in a variety of application domains such as power systems or supply chain management. We propose a solution method for these structures, in which the primal problem is modified in a certain way, guaranteeing that the solutions produced by the corresponding dual are feasible for the original unmodified primal problem. The modification is simple to implement and the method is amenable to distributed computations. We also demonstrate that the quality of the solutions recovered using our procedure improves as the problem size increases, making it particularly useful for large scale instances for which commercial solvers are inadequate. We illustrate the efficacy of our method with extensive experimentations on a problem stemming from power systems.

preprint2014arXiv

Efficient Approximation of Quantum Channel Capacities

We propose an iterative method for approximating the capacity of classical-quantum channels with a discrete input alphabet and a finite dimensional output, possibly under additional constraints on the input distribution. Based on duality of convex programming, we derive explicit upper and lower bounds for the capacity. To provide an $\varepsilon$-close estimate to the capacity, the presented algorithm requires $O(\tfrac{(N \vee M) M^3 \log(N)^{1/2}}{\varepsilon})$, where $N$ denotes the input alphabet size and $M$ the output dimension. We then generalize the method for the task of approximating the capacity of classical-quantum channels with a bounded continuous input alphabet and a finite dimensional output. For channels with a finite dimensional quantum mechanical input and output, the idea of a universal encoder allows us to approximate the Holevo capacity using the same method. In particular, we show that the problem of approximating the Holevo capacity can be reduced to a multidimensional integration problem. For families of quantum channels fulfilling a certain assumption we show that the complexity to derive an $\varepsilon$-close solution to the Holevo capacity is subexponential or even polynomial in the problem size. We provide several examples to illustrate the performance of the approximation scheme in practice.

preprint2013arXiv

Performance Bounds for the Scenario Approach and an Extension to a Class of Non-convex Programs

We consider the Scenario Convex Program (SCP) for two classes of optimization problems that are not tractable in general: Robust Convex Programs (RCPs) and Chance-Constrained Programs (CCPs). We establish a probabilistic bridge from the optimal value of SCP to the optimal values of RCP and CCP in which the uncertainty takes values in a general, possibly infinite dimensional, metric space. We then extend our results to a certain class of non-convex problems that includes, for example, binary decision variables. In the process, we also settle a measurability issue for a general class of scenario programs, which to date has been addressed by an assumption. Finally, we demonstrate the applicability of our results on a benchmark problem and a problem in fault detection and isolation.

preprint2013arXiv

Symbolic control of stochastic systems via approximately bisimilar finite abstractions

Symbolic approaches to the control design over complex systems employ the construction of finite-state models that are related to the original control systems, then use techniques from finite-state synthesis to compute controllers satisfying specifications given in a temporal logic, and finally translate the synthesized schemes back as controllers for the concrete complex systems. Such approaches have been successfully developed and implemented for the synthesis of controllers over non-probabilistic control systems. In this paper, we extend the technique to probabilistic control systems modeled by controlled stochastic differential equations. We show that for every stochastic control system satisfying a probabilistic variant of incremental input-to-state stability, and for every given precision $\varepsilon>0$, a finite-state transition system can be constructed, which is $\varepsilon$-approximately bisimilar (in the sense of moments) to the original stochastic control system. Moreover, we provide results relating stochastic control systems to their corresponding finite-state transition systems in terms of probabilistic bisimulation relations known in the literature. We demonstrate the effectiveness of the construction by synthesizing controllers for stochastic control systems over rich specifications expressed in linear temporal logic. The discussed technique enables a new, automated, correct-by-construction controller synthesis approach for stochastic control systems, which are common mathematical models employed in many safety critical systems subject to structured uncertainty and are thus relevant for cyber-physical applications.

Peyman Mohajerin Esfahani

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

From Optimization to Control: Quasi Policy Iteration

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards

Tight Generalization Bounds for Noiseless Inverse Optimization

Adaptive Composite Online Optimization: Predictions in Static and Dynamic Environments

Multimode Diagnosis for Switched Affine Systems with Noisy Measurement

Multiple Faults Estimation in Dynamical Systems: Tractable Design and Performance Bounds

The Nonconvex Geometry of Linear Inverse Problems

Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

Learning robust control for LQR systems with multiplicative noise via policy gradient

Robust Control Design for Linear Systems via Multiplicative Noise

Security Versus Privacy

A Tractable Fault Detection and Isolation Approach for Nonlinear Systems with Probabilistic Performance

Approximations of Stochastic Hybrid Systems: A Compositional Approach

Distributionally Robust Logistic Regression

Efficient Approximation of Channel Capacities

A Decomposition Method for Large Scale MILPs, with Performance Guarantees and a Power System Application

Efficient Approximation of Quantum Channel Capacities

Performance Bounds for the Scenario Approach and an Extension to a Class of Non-convex Programs

Symbolic control of stochastic systems via approximately bisimilar finite abstractions