Researcher profile

Prashant G. Mehta

Prashant G. Mehta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Transformer-like Inference from Optimal Control

Decoder-only transformers compute the conditional probability of the next token from a sequence of past observations. This paper derives, from first principles, inference architectures that solve the same prediction problem - and in doing so, recovers transformer-like layer operations as a consequence of optimal control theory. The framework is developed for two model classes: a nonlinear model of discrete-valued processes, directly motivated by the transformer, and a linear Gaussian model as a tractable baseline. For both model classes, the prediction objective is reformulated as an optimal control problem whose solution yields an explicit inference algorithm, the dual filter, with a layer structure that mirrors the layer structure of a decoder-only transformer. Numerical experiments provide a comparison of the optimal control to attention weights from a trained transformer. These experiments reveal that when the embedding dimension is insufficient, the transformer implicitly exploits non-Markovian structure.

preprint2022arXiv

Controlled Interacting Particle Algorithms for Simulation-based Reinforcement Learning

This paper is concerned with optimal control problems for control systems in continuous time, and interacting particle system methods designed to construct approximate control solutions. Particular attention is given to the linear quadratic (LQ) control problem. There is a growing interest in re-visiting this classical problem, in part due to the successes of reinforcement learning (RL). The main question of this body of research (and also of our paper) is to approximate the optimal control law {\em without} explicitly solving the Riccati equation. A novel simulation-based algorithm, namely a dual ensemble Kalman filter (EnKF), is introduced. The algorithm is used to obtain formulae for optimal control, expressed entirely in terms of the EnKF particles. An extension to the nonlinear case is also presented. The theoretical results and algorithms are illustrated with numerical experiments.

preprint2022arXiv

Duality for Nonlinear Filtering I: Observability

This paper is concerned with the development and use of duality theory for a hidden Markov model (HMM) with white noise observations. The main contribution of this work is to introduce a backward stochastic differential equation (BSDE) as a dual control system. A key outcome is that stochastic observability (resp. detectability) of the HMM is expressed in dual terms: as controllability (resp. stabilizability) of the dual control system. All aspects of controllability, namely, definition of controllable space and controllability gramian, along with their properties and explicit formulae, are discussed. The proposed duality is shown to be an exact extension of the classical duality in linear systems theory. One can then relate and compare the linear and the nonlinear systems. A side-by-side summary of this relationship is given in a tabular form (Table~II).

preprint2022arXiv

Duality for Nonlinear Filtering II: Optimal Control

This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The constraint for the optimal control problem is the backward stochastic differential equation (BSDE) introduced in the companion paper. The optimal control solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The proposed duality is shown to be an exact extension of the classical Kalman-Bucy duality, and different from other types of optimal control and variational formulations given in literature.

preprint2022arXiv

How does a Rational Agent Act in an Epidemic?

Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role of partial information on an agent's decision-making, and study the impact of such decisions by a large number of agents on the spread of the virus in the population. The motivation comes from the presymptomatic and asymptomatic spread of the COVID-19 virus where an agent unwittingly spreads the virus. We show that even in a setting with fully rational agents, limited information on the viral state can result in an epidemic growth.

preprint2022arXiv

Optimality vs Stability Trade-off in Ensemble Kalman Filters

This paper is concerned with optimality and stability analysis of a family of ensemble Kalman filter (EnKF) algorithms. EnKF is commonly used as an alternative to the Kalman filter for high-dimensional problems, where storing the covariance matrix is computationally expensive. The algorithm consists of an ensemble of interacting particles driven by a feedback control law. The control law is designed such that, in the linear Gaussian setting and asymptotic limit of infinitely many particles, the mean and covariance of the particles follow the exact mean and covariance of the Kalman filter. The problem of finding a control law that is exact does not have a unique solution, reminiscent of the problem of finding a transport map between two distributions. A unique control law can be identified by introducing control cost functions, that are motivated by the optimal transportation problem or Schrödinger bridge problem. The objective of this paper is to study the relationship between optimality and long-term stability of a family of exact control laws. Remarkably, the control law that is optimal in the optimal transportation sense leads to an EnKF algorithm that is not stable.

preprint2021arXiv

Feedback Particle Filter for Collective Inference

The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number ($M$) of non-interacting agents (targets) with a large number ($M$) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-$M$ limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with $M=1$) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large $M$.

preprint2021arXiv

Optimal Transportation Methods in Nonlinear Filtering: The feedback particle filter

Feedback particle filter (FPF) is a Monte-Carlo (MC) algorithm to approximate the solution of a stochastic filtering problem. In contrast to conventional particle filters, the Bayesian update step in FPF is implemented via a mean-field type feedback control law. The objective for this paper is to situate the development of FPF and related controlled interacting particle system algorithms within the framework of optimal transportation theory. Starting from the simplest setting of the Bayes' update formula, a coupling viewpoint is introduced to construct particle filters. It is shown that the conventional importance sampling resampling particle filter implements an independent coupling. Design of optimal couplings is introduced first for the simple Gaussian settings and subsequently extended to derive the FPF algorithm. The final half of the paper provides a review of some of the salient aspects of the FPF algorithm including the feedback structure, algorithms for gain function design, and comparison with conventional particle filters. The comparison serves to illustrate the benefit of feedback in particle filtering.

preprint2020arXiv

A Dual Characterization of Observability for Stochastic Systems

This paper is concerned with a characterization of the observability for a continuous-time hidden Markov model where the state evolves as a general continuous-time Markov process and the observation process is modeled as nonlinear function of the state corrupted by the Gaussian measurement noise. The main technical tool is based on the recently discovered duality relationship between minimum variance estimation and stochastic optimal control: The observability is defined as a dual of the controllability for a certain backward stochastic differential equation. Based on the dual formulation, a test for observability is presented and related to literature. The proposed duality-based framework allows one to easily relate and compare the linear and the nonlinear systems. A side-by-side summary of this relationship is given in a tabular form (Table~1)

preprint2020arXiv

Convex Q-Learning, Part 1: Deterministic Optimal Control

It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent? These questions are unanswered even in the special case of Q-function approximations that are linear in the parameter. The challenge seems paradoxical, given the long history of convex analytic approaches to dynamic programming. The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning. The main conclusions are summarized as follows: (i) The new class of convex Q-learning algorithms is introduced based on the convex relaxation of the Bellman equation. Convergence is established under general conditions, including a linear function approximation for the Q-function. (ii) A batch implementation appears similar to the famed DQN algorithm (one engine behind AlphaZero). It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation: (a) it is shown that both seek solutions to the same fixed point equation, and (b) the ODE approximations for the two algorithms coincide, and little is known about the stability of this ODE. These results are obtained for deterministic nonlinear systems with total cost criterion. Many extensions are proposed, including kernel implementation, and extension to MDP models.

preprint2020arXiv

Deep FPF: Gain function approximation in high-dimensional setting

In this paper, we present a novel approach to approximate the gain function of the feedback particle filter (FPF). The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The numerical problem is to approximate the exact gain function using only finitely many particles sampled from the probability distribution. Inspired by the recent success of the deep learning methods, we represent the gain function as a gradient of the output of a neural network. Thereupon considering a certain variational formulation of the Poisson equation, an optimization problem is posed for learning the weights of the neural network. A stochastic gradient algorithm is described for this purpose. The proposed approach has two significant properties/advantages: (i) The stochastic optimization algorithm allows one to process, in parallel, only a batch of samples (particles) ensuring good scaling properties with the number of particles; (ii) The remarkable representation power of neural networks means that the algorithm is potentially applicable and useful to solve high-dimensional problems. We numerically establish these two properties and provide extensive comparison to the existing approaches.

preprint2020arXiv

On the Lyapunov Foster criterion and Poincaré inequality for Reversible Markov Chains

This paper presents an elementary proof of stochastic stability of a discrete-time reversible Markov chain starting from a Foster-Lyapunov drift condition. Besides its relative simplicity, there are two salient features of the proof: (i) it relies entirely on functional-analytic non-probabilistic arguments; and (ii) it makes explicit the connection between a Foster-Lyapunov function and Poincaré inequality. The proof is used to derive an explicit bound for the spectral gap. An extension to the non-reversible case is also presented.