Source author record

Ashutosh Nayyar

Ashutosh Nayyar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control math.OC Computer Science and Game Theory eess.SY Machine Learning Artificial Intelligence Information Theory math.IT Multiagent Systems Other Computer Science

Catalog footprint

What is connected

22works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

Behavior Foundation Models (BFMs) enable scalable imitation learning (IL) by pretraining task-agnostic representations that can be rapidly adapted to new tasks. However, existing BFMs assume fixed environment dynamics, limiting their robustness under real-world shifts such as changes in friction, actuation, or sensor noise. We address this by formulating BFM task-inference as a robust minimax optimization problem, enabling adaptation to worst-case dynamics perturbations without modifying pretraining. To the best of our knowledge, this is the first BFM-based framework that achieves robustness to dynamics shifts while relying solely on offline data from a single nominal environment. Our approach significantly outperforms standard BFM and robust offline IL baselines under dynamics shifts. These results demonstrate that robust policy can be achieved entirely at task-inference time, improving the practicality of BFMs in dynamic settings.

preprint2022arXiv

Optimal Communication and Control Strategies for a Multi-Agent System in the Presence of an Adversary

We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary may eavesdrop on their communication. Thus, the agents in the team must effectively coordinate with each other while being robust to the adversary's malicious actions. We model this interaction between the team and the adversary as a stochastic zero-sum game where the team aims to minimize a cost while the adversary aims to maximize it. Under some assumptions on the adversary's capabilities, we characterize a min-max control and communication strategy for the team. We supplement this characterization with several structural results that can make the computation of the min-max strategy more tractable.

preprint2022arXiv

Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints

Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent policies that maximize the reward while ensuring that the probability of satisfying the temporal logic specification is sufficiently high. We reformulate the problem as a constrained partially observable Markov decision process (POMDP) and provide a novel approach that can leverage off-the-shelf unconstrained POMDP solvers for solving it. Our approach guarantees approximate optimality and constraint satisfaction with high probability. We demonstrate its effectiveness by implementing it on several models of interest.

preprint2020arXiv

Optimal Dynamic Mechanism Design with Stochastic Supply and Flexible Consumers

We consider the problem of designing an expected-revenue maximizing mechanism for allocating multiple non-perishable goods of $k$ varieties to flexible consumers over $T$ time steps. In our model, a random number of goods of each variety may become available to the seller at each time and a random number of consumers may enter the market at each time. Each consumer is present in the market for one time step and wants to consume one good of one of its desired varieties. Each consumer is associated with a flexibility level that indicates the varieties of the goods it is equally interested in. A consumer's flexibility level and the utility it gets from consuming a good of its desired varieties are its private information. We characterize the allocation rule for a Bayesian incentive compatible, individually rational and expected revenue maximizing mechanism in terms of the solution to a dynamic program. The corresponding payment function is also specified in terms of the optimal allocation function. We leverage the structure of the consumers' flexibility model to simplify the dynamic program and provide an alternative description of the optimal mechanism in terms of thresholds computed by the dynamic program.

preprint2020arXiv

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(\sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.

preprint2020arXiv

Testing for Anomalies: Active Strategies and Non-asymptotic Analysis

The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly declaring it to be safe is sufficiently large. This problem is modeled as an active hypothesis testing problem in the Neyman-Pearson setting. Component-selection and inference strategies are designed and analyzed in the non-asymptotic regime. For a specific class of homogeneous problems, stronger (with respect to prior work) non-asymptotic converse and achievability bounds are provided.

preprint2016arXiv

Decentralized Control Problems with Substitutable Actions

We consider a decentralized system with multiple controllers and define substitutability of one controller by another in open-loop strategies. We explore the implications of this property on the optimization of closed-loop strategies. In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to the known classes of "simpler" decentralized problems such as partially nested or quadratically invariant problems, our results show that, under the substitutability assumption, linear strategies are optimal and we provide a complete state space characterization of optimal strategies. We also identify a family of information structures that all give the same optimal cost as the centralized information structure under the substitutability assumption. Our results suggest that open-loop substitutability can work as a counterpart of the information structure requirements that enable simplification of decentralized control problems.

preprint2016arXiv

Dynamic Teams and Decentralized Control Problems with Substitutable Actions

This paper considers two problems -- a dynamic team problem and a decentralized control problem. The problems we consider do not belong to the known classes of "simpler" dynamic team/decentralized control problems such as partially nested or quadratically invariant problems. However, we show that our problems admit simple solutions under an assumption referred to as the substitutability assumption. Intuitively, substitutability in a team (resp. decentralized control) problem means that the effects of one team member's (resp. controller's) action on the cost function and the information (resp. state dynamics) can be achieved by an action of another member (resp. controller). For the non-partially-nested LQG dynamic team problem, it is shown that under certain conditions linear strategies are optimal. For the non-partially-nested decentralized LQG control problem, the state structure can be exploited to obtain optimal control strategies with recursively update-able sufficient statistics. These results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized control problems.

preprint2016arXiv

Optimal Local and Remote Controllers with Unreliable Communication

We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal the successful receipt of transmitted packets. The objective of the two controllers is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested LQG problem, we obtain explicit optimal strategies for the two controllers. In the optimal strategies, both controllers compute a common estimate of the plant state based on the common information. The remote controller's action is linear in the common estimated state, and the local controller's action is linear in both the actual state and the common estimated state.

preprint2014arXiv

Common Information based Markov Perfect Equilibria for Linear-Gaussian Games with Asymmetric Information

We consider a class of two-player dynamic stochastic nonzero-sum games where the state transition and observation equations are linear, and the primitive random variables are Gaussian. Each controller acquires possibly different dynamic information about the state process and the other controller's past actions and observations. This leads to a dynamic game of asymmetric information among the controllers. Building on our earlier work on finite games with asymmetric information, we devise an algorithm to compute a Nash equilibrium by using the common information among the controllers. We call such equilibria common information based Markov perfect equilibria of the game, which can be viewed as a refinement of Nash equilibrium in games with asymmetric information. If the players' cost functions are quadratic, then we show that under certain conditions a unique common information based Markov perfect equilibrium exists. Furthermore, this equilibrium can be computed by solving a sequence of linear equations. We also show through an example that there could be other Nash equilibria in a game of asymmetric information, not corresponding to common information based Markov perfect equilibria.

preprint2014arXiv

Duration-Differentiated Services in Electricity

The integration of renewable sources poses challenges at the operational and economic levels of the power grid. In terms of keeping the balance between supply and demand, the usual scheme of supply following load may not be appropriate for large penetration levels of uncertain and intermittent renewable supply. In this paper, we focus on an alternative scheme in which the load follows the supply, exploiting the flexibility associated with the demand side. We consider a model of flexible loads that are to be serviced by zero-marginal cost renewable power together with conventional generation if necessary. Each load demands 1 kW for a specified number of time slots within an operational period. The flexibility of a load resides in the fact that the service may be delivered over any slots within the operational period. Loads therefore require flexible energy services that are differentiated by the demanded duration. We focus on two problems associated with durations-differentiated loads. The first problem deals with the operational decisions that a supplier has to make to serve a given set of duration differentiated loads. The second problem focuses on a market implementation for duration differentiated services. We give necessary and sufficient conditions under which the available power can service the loads, and we describe an algorithm that constructs an appropriate allocation. In the event the available supply is inadequate, we characterize the minimum amount of power that must be purchased to service the loads. Next we consider a forward market where consumers can purchase duration differentiated energy services. We first characterize social welfare maximizing allocations in this forward market and then show the existence of an efficient competitive equilibrium.

preprint2014arXiv

Optimal Control for LQG Systems on Graphs---Part I: Structural Results

In this two-part paper, we identify a broad class of decentralized output-feedback LQG systems for which the optimal control strategies have a simple intuitive estimation structure and can be computed efficiently. Roughly, we consider the class of systems for which the coupling of dynamics among subsystems and the inter-controller communication is characterized by the same directed graph. Furthermore, this graph is assumed to be a multitree, that is, its transitive reduction can have at most one directed path connecting each pair of nodes. In this first part, we derive sufficient statistics that may be used to aggregate each controller's growing available information. Each controller must estimate the states of the subsystems that it affects (its descendants) as well as the subsystems that it observes (its ancestors). The optimal control action for a controller is a linear function of the estimate it computes as well as the estimates computed by all of its ancestors. Moreover, these state estimates may be updated recursively, much like a Kalman filter.

preprint2014arXiv

Rate-constrained Energy Services: Allocation Policies and Market Decisions

The integration of renewable generation poses operational and economic challenges for the electricity grid. For the core problem of power balance, the legacy paradigm of tailoring supply to follow random demand may be inappropriate under deep penetration of uncertain and intermittent renewable generation. In this situation, there is an emerging consensus that the alternative approach of controlling demand to follow random supply offers compelling economic benefits in terms of reduced regulation costs. This approach exploits the flexibility of demand side resources and requires sensing, actuation, and communication infrastructure; distributed control algorithms; and viable schemes to compensate participating loads. This paper considers rate-constrained energy services which are a specific paradigm for flexible demand. These services are characterized by a specified delivery window, the total amount of energy that must be supplied over this window, and the maximum rate at which this energy may be delivered. We consider a forward market where rate-constrained energy services are traded. We explore allocation policies and market decisions of a supplier in this market. The supplier owns a generation mix that includes some uncertain renewable generation and may also purchase energy in day-ahead and real-time markets to meet customer demand. The supplier must optimally select the portfolio of rate-constrained services to sell, the amount of day-ahead energy to buy, and the policies for making real-time energy purchases and allocations to customers to maximize its expected profit. We offer solutions to the supplier's decision and control problems to economically provide rate constrained energy services.

preprint2014arXiv

Signaling in sensor networks for sequential detection

Sequential detection problems in sensor networks are considered. The true state of nature/true hypothesis is modeled as a binary random variable $H$ with known prior distribution. There are $N$ sensors making noisy observations about the hypothesis; $\mathcal{N} =\{1,2,\ldots,N\}$ denotes the set of sensors. Sensor $i$ can receive messages from a subset $\mathcal{P}^i \subset \mathcal{N}$ of sensors and send a message to a subset $\mathcal{C}^i \subset \mathcal{N}$. Each sensor is faced with a stopping problem. At each time $t$, based on the observations it has taken so far and the messages it may have received, sensor $i$ can decide to stop and communicate a binary decision to the sensors in $\mathcal{C}^i$, or it can continue taking observations and receiving messages. After sensor $i$'s binary decision has been sent, it becomes inactive. Sensors incur operational costs (cost of taking observations, communication costs etc.) while they are active. In addition, the system incurs a terminal cost that depends on the true hypothesis $H$, the sensors' binary decisions and their stopping times. The objective is to determine decision strategies for all sensors to minimize the total expected cost.

preprint2014arXiv

Sufficient statistics for linear control strategies in decentralized systems with partial history sharing

In decentralized control systems with linear dynamics, quadratic cost, and Gaussian disturbance (also called decentralized LQG systems) linear control strategies are not always optimal. Nonetheless, linear control strategies are appealing due to analytic and implementation simplicity. In this paper, we investigate decentralized LQG systems with partial history sharing information structure and identify finite dimensional sufficient statistics for such systems. Unlike prior work on decentralized LQG systems, we do not assume partially nestedness or quadratic invariance. Our approach is based on the common information approach of Nayyar \emph{et al}, 2013 and exploits the linearity of the system dynamics and control strategies. To illustrate our methodology, we identify sufficient statistics for linear strategies in decentralized systems where controllers communicate over a strongly connected graph with finite delays, and for decentralized systems consisting of coupled subsystems with control sharing or one-sided one step delay sharing information structures.

preprint2013arXiv

Structural Results and Explicit Solution for Two-Player LQG Systems on a Finite Time Horizon

It is well-known that linear dynamical systems with Gaussian noise and quadratic cost (LQG) satisfy a separation principle. Finding the optimal controller amounts to solving separate dual problems; one for control and one for estimation. For the discrete-time finite-horizon case, each problem is a simple forward or backward recursion. In this paper, we consider a generalization of the LQG problem in which there are two controllers. Each controller is responsible for one of two system inputs, but has access to different subsets of the available measurements. Our paper has three main contributions. First, we prove a fundamental structural result: sufficient statistics for the controllers can be expressed as conditional means of the global state. Second, we give explicit state-space formulae for the optimal controller. These formulae are reminiscent of the classical LQG solution with dual forward and backward recursions, but with the important difference that they are intricately coupled. Lastly, we show how these recursions can be solved efficiently, with computational complexity comparable to that of the centralized problem.

preprint2012arXiv

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

A general model of decentralized stochastic control called partial history sharing information structure is presented. In this model, at each step the controllers share part of their observation and control history with each other. This general model subsumes several existing models of information sharing as special cases. Based on the information commonly known to all the controllers, the decentralized problem is reformulated as an equivalent centralized problem from the perspective of a coordinator. The coordinator knows the common information and select prescriptions that map each controller's local information to its control actions. The optimal control problem at the coordinator is shown to be a partially observable Markov decision process (POMDP) which is solved using techniques from Markov decision theory. This approach provides (a) structural results for optimal strategies, and (b) a dynamic program for obtaining optimal strategies for all controllers in the original decentralized problem. Thus, this approach unifies the various ad-hoc approaches taken in the literature. In addition, the structural results on optimal control strategies obtained by the proposed approach cannot be obtained by the existing generic approach (the person-by-person approach) for obtaining structural results in decentralized problems; and the dynamic program obtained by the proposed approach is simpler than that obtained by the existing generic approach (the designer's approach) for obtaining dynamic programs in decentralized problems.

preprint2012arXiv

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric information is shown to be equivalent to another game with symmetric information. Further, under certain conditions, a Markov state is identified for the equivalent symmetric information game and its Markov perfect equilibria are characterized. This characterization provides a backward induction algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class of Nash equilibria of the original game that can be characterized in this backward manner are named common information based Markov perfect equilibria.

preprint2012arXiv

Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor

We consider a remote estimation problem with an energy harvesting sensor and a remote estimator. The sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multi-dimensional linear Gaussian system. It harvests energy from its environment (say, for example, through a solar cell) and uses this energy for the purpose of communicating with the estimator. Due to the randomness of energy available for communication, the sensor may not be able to communicate all the time. The sensor may also want to save its energy for future communications. The estimator relies on messages communicated by the sensor to produce real-time estimates of the source state. We consider the problem of finding a communication scheduling strategy for the sensor and an estimation strategy for the estimator that jointly minimize an expected sum of communication and distortion costs over a finite time horizon. Our goal of joint optimization leads to a decentralized decision-making problem. By viewing the problem from the estimator's perspective, we obtain a dynamic programming characterization for the decentralized decision-making problem that involves optimization over functions. Under some symmetry assumptions on the source statistics and the distortion metric, we show that an optimal communication strategy is described by easily computable thresholds and that the optimal estimate is a simple function of the most recently received sensor observation.

preprint2011arXiv

Revenue Maximization in Spectrum Auctions for Dynamic Spectrum Access

We investigate revenue maximization problems in auctions for dynamic spectrum access. We consider the frequency division and spread spectrum methods of dynamic spectrum sharing. In the frequency division method, a primary spectrum user allocates portions of spectrum to different secondary users. In the spread spectrum method, the primary user allocates transmission powers to each secondary user. In both cases, we assume that a secondary user's utility function is linear in the rate it can achieve by using the available spectrum/power. Assuming strategic users, we present incentive compatible, individually rational and revenue-maximizing mechanisms for the two scenarios.

preprint2010arXiv

Decentralized Detection with Signaling

We consider a sequential problem in decentralized detection. Two observers can make repeated noisy observations of a binary hypothesis on the state of the environment. At any time, any of the two observers can stop and send a final message to the other observer or it may continue to take more measurements. After an observer has sent its final message, it stops operating. The other observer is then faced with a different stopping problem. At each time instant, it can decide either to stop and declare a final decision on the hypothesis or take another measurement. At each time, the system incurs an operating cost depending on the number of observers that are active at that time. A terminal cost that measures the accuracy of the final decision is incurred at the end. We show that, unlike in other sequential detection problems, stopping rules characterized by two thresholds on an observer's posterior belief no longer guarantee optimality in this problem. Thus the potential for signaling among observers alters the nature of optimal policies. We obtain a new parametric characterization of optimal policies for this problem.

preprint2010arXiv

Optimal Control Strategies in Delayed Sharing Information Structures

The $n$-step delayed sharing information structure is investigated. This information structure comprises of $K$ controllers that share their information with a delay of $n$ time steps. This information structure is a link between the classical information structure, where information is shared perfectly between the controllers, and a non-classical information structure, where there is no "lateral" sharing of information among the controllers. Structural results for optimal control strategies for systems with such information structures are presented. A sequential methodology for finding the optimal strategies is also derived. The solution approach provides an insight for identifying structural results and sequential decomposition for general decentralized stochastic control problems.

Ashutosh Nayyar

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

Optimal Communication and Control Strategies for a Multi-Agent System in the Presence of an Adversary

Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints

Optimal Dynamic Mechanism Design with Stochastic Supply and Flexible Consumers

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Testing for Anomalies: Active Strategies and Non-asymptotic Analysis

Decentralized Control Problems with Substitutable Actions

Dynamic Teams and Decentralized Control Problems with Substitutable Actions

Optimal Local and Remote Controllers with Unreliable Communication

Common Information based Markov Perfect Equilibria for Linear-Gaussian Games with Asymmetric Information

Duration-Differentiated Services in Electricity

Optimal Control for LQG Systems on Graphs---Part I: Structural Results

Rate-constrained Energy Services: Allocation Policies and Market Decisions

Signaling in sensor networks for sequential detection

Sufficient statistics for linear control strategies in decentralized systems with partial history sharing

Structural Results and Explicit Solution for Two-Player LQG Systems on a Finite Time Horizon

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor

Revenue Maximization in Spectrum Auctions for Dynamic Spectrum Access

Decentralized Detection with Signaling

Optimal Control Strategies in Delayed Sharing Information Structures