Researcher profile

Deepanshu Vasal

Deepanshu Vasal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

A dynamic program to achieve capacity of multiple access channel with noiseless feedback

In this paper, we consider the problem of evaluating capacity expression of a multiple access channel (MAC) with noiseless feedback. So far, the capacity expression for this channel is known through a multi letter directed information by Kramer [1]. Recently, it was shown in [2] that one can pose it as a dynamic optimization problem, however, no dynamic program was provided as the authors claimed there is no notion of state that is observed by both the senders. In this paper, we build upon [2] to show that there indeed exists a state and therefore a dynamic program (DP) that decomposes this dynamic optimization problem, and equivalently a Bellman fixed-point equation to evaluate capacity of this channel. We do so by defining a common belief on private messages and private beliefs of the two senders, and using this common belief as state of the system. We further show that this DP can be further reduced to a DP with state as the common belief on just the messages. This provides a single letter characterization of the capacity of this channel.

preprint2022arXiv

Fault Tolerant Equilibria in Anonymous Games: best response correspondences and fixed points

The notion of fault tolerant Nash equilibria has been introduced as a way of studying the robustness of Nash equilibria. Under this notion, a fixed number of players are allowed to exhibit faulty behavior in which they may deviate arbitrarily from an equilibrium strategy. A Nash equilibrium in a game with $N$ players is said to be $α$-tolerant if no non-faulty user wants to deviate from an equilibrium strategy as long as $N-α-1$ other players are playing the equilibrium strategies, i.e., it is robust to deviations from rationality by $α$ faulty players. In prior work, $α$-tolerance has been largely viewed as a property of a given Nash equilibria. Here, instead we consider following Nash's approach for showing the existence of equilibria, namely, through the use of best response correspondences and fixed-point arguments. In this manner, we provide sufficient conditions for the existence an $α$-tolerant equilibrium. This involves first defining an $α$-tolerant best response correspondence. Given a strategy profile of non-faulty agents, this correspondence contains strategies for a non-faulty player that are a best response given any strategy profile of the faulty players. We prove that if this correspondence is non-empty, then it is upper-hemi-continuous. This enables us to apply Kakutani's fixed-point theorem and argue that if this correspondence is non-empty for every strategy profile of the non-faulty players then there exists an $α$-tolerant equilibrium. However, we also illustrate by examples, that in many games this best response correspondence will be empty for some strategy profiles even though $α$-tolerant equilibira still exist.

preprint2022arXiv

Linear Coding for AWGN channels with Noisy Output Feedback via Dynamic Programming

The optimal coding scheme for communicating a Gaussian message over an Additive White Gaussian noise (AWGN) channel with AWGN output feedback, with a limited number of transmissions is unknown. Even if we restrict the scope of the coding scheme to linear schemes, still, deriving the optimal coding scheme is a challenging task. The state-of-the-art linear scheme for channels with noisy feedback is by Chance and Love, where the coefficients of the linear scheme are numerically optimized based on unique observations [1]. In this paper, we introduce a new class of sequential linear schemes for this channel by introducing a novel linear state process at the transmitter and derive the optimal sequential scheme within this class of schemes in a closed-form by formulating a novel Dynamic Programming (DP). We empirically show that our scheme outperforms the state-of-the-art linear scheme in [1] for noisy feedback and coincides with the SK scheme for noiseless feedback. We also show that in communicating message bits as opposed to a Gaussian message, a learning-based approach further improves the reliability of sequential linear schemes. This problem is an instance of decentralized control without any common information and to the best of our knowledge the first such scenario where we can derive analytical solutions using a DP.

preprint2022arXiv

Master equation of discrete time graphon mean field games and teams

In this paper, we present a sequential decomposition algorithm equivalent of Master equation to compute GMFE of GMFG and graphon optimal Markovian policies (GOMPs) of graphon mean field teams (GMFTs). We consider a large population of players sequentially making strategic decisions where the actions of each player affect their neighbors which is captured in a graph, generated by a known graphon. Each player observes a private state and also a common information as a graphon mean-field population state which represents the empirical networked distribution of other players' types. We consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute both GMFE and GOMP that depend on both, a player's private type, and the current (dynamic) population state determined through the graphon. Each step in computing GMFE consists of solving a fixed-point equation, while computing GOMP involves solving for an optimization problem. We provide conditions on model parameters for which there exists such a GMFE. Using this algorithm, we obtain the GMFE and GOMP for a specific security setup in cyber physical systems for different graphons that capture the interactions between the nodes in the system.

preprint2022arXiv

Network Design for Social Welfare

In this paper, we consider the problem of network design on network games. We study the conditions on the adjacency matrix of the underlying network to design a game such that the Nash equilibrium coincides with the social optimum. We provide the examples for linear quadratic games that satisfy this condition. Furthermore, we identify conditions on properties of adjacency matrix that provide a unique solution using variational inequality formulation, and verify the robustness and continuity of the social cost under perturbations of the network. Finally we comment on individual rationality and extension of our results to large random networked games.

preprint2021arXiv

Mechanism Design for Large Scale Network Utility Maximization

Network utility maximization (NUM) is a general framework for designing distributed optimization algorithms for large-scale networks. An economic challenge arises in the presence of strategic agents' private information. Existing studies proposed (economic) mechanisms but largely neglected the issue of large-scale implementation. Specifically, they require certain modifications to the deployed algorithms, which may bring the significant cost. To tackle this challenge, we present the large-scale Vickery-Clark-Grove (VCG) Mechanism for NUM, with a simpler payment rule characterized by the shadow prices. The Large-Scale VCG Mechanism maximizes the network utility and achieves individual rationality and budget balance. With infinitely many agents, agents' truthful reports of their types are their dominant strategies; for the finite case, each agent's incentive to misreport converges quadratically to zero. For practical implementation, we introduce a modified mechanism that possesses an additional important technical property, superimposability, which makes it able to be built upon any (potentially distributed) algorithm that optimally solves the NUM Problem and ensures all agents to obey the algorithm. We then extend this idea to the dynamic case, when agents' types are dynamically evolving as a controlled Markov process. In this case, the mechanism leads to incentive compatible actions of agent for each time slot.

preprint2020arXiv

Decentralized multi-agent reinforcement learning with shared actions

In this paper, we propose a novel model-free reinforcement learning algorithm to compute the optimal policies for a multi-agent system with $N$ cooperative agents where each agent privately observes it's own private type and publicly observes each others' actions. The goal is to maximize their collective reward. The problem belongs to the broad class of decentralized control problems with partial information. We use the common agent approach wherein some fictitious common agent picks the best policy based on a belief on the current states of the agents. These beliefs are updated individually for each agent from their current belief and action histories. Belief state updates without the knowledge of system dynamics is a challenge. In this paper, we employ particle filters called the bootstrap filter distributively across agents to update the belief. We provide a model-free reinforcement learning (RL) method for this multi-agent partially observable Markov decision processes using the particle filter and sampled trajectories to estimate the optimal policies for the agents. We showcase our results with the help of a smartgrid application where the users strive to reduce collective cost of power for all the agents in the grid. Finally, we compare the performances for model and model-free implementation of the RL algorithm establishing the effectiveness of particle filter (pf) method.

preprint2020arXiv

Dynamic information design

We consider the problem of dynamic information design with one sender and one receiver where the sender observers a private state of the system and takes an action to send a signal based on its observation to a receiver. Based on this signal, the receiver takes an action that determines rewards for both the sender and the receiver and controls the state of the system. In this technical note, we show that this problem can be considered as a problem of dynamic game of asymmetric information and its perfect Bayesian equilibrium (PBE) and Stackelberg equilibrium (SE) can be analyzed using the algorithms presented in [1], [2] by the same author (among others). We then extend this model when there is one sender and multiple receivers and provide algorithms to compute a class of equilibria of this game.

preprint2020arXiv

Existence of structured perfect Bayesian equilibrium in dynamic games of asymmetric information

In~[1],authors considered a general finite horizon model of dynamic game of asymmetric information, where N players have types evolving as independent Markovian process, where each player observes its own type perfectly and actions of all players. The authors present a sequential decomposition algorithm to find all structured perfect Bayesian equilibria of the game. The algorithm consists of solving a class of fixed-point of equations for each time $t,π_t$, whose existence was left as an open question. In this paper, we prove existence of these fixed-point equations for compact metric spaces.

preprint2020arXiv

Model-free Reinforcement Learning for Non-stationary Mean Field Games

In this paper, we consider a finite horizon, non-stationary, mean field games (MFG) with a large population of homogeneous players, sequentially making strategic decisions, where each player is affected by other players through an aggregate population state termed as mean field state. Each player has a private type that only it can observe, and a mean field population state representing the empirical distribution of other players' types, which is shared among all of them. Recently, authors in [1] provided a sequential decomposition algorithm to compute mean field equilibrium (MFE) for such games which allows for the computation of equilibrium policies for them in linear time than exponential, as before. In this paper, we extend it for the case when state transitions are not known, to propose a reinforcement learning algorithm based on Expected Sarsa with a policy gradient approach that learns the MFE policy by learning the dynamics of the game simultaneously. We illustrate our results using cyber-physical security example.

preprint2020arXiv

Model-free Reinforcement Learning for Stochastic Stackelberg Security Games

In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy which maximizes its own returns given the knowledge that the follower is going to play the best response to its policy. Thus, both players converge to a pair of policies that form the Stackelberg equilibrium of the game. Recently,~[1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend the idea to an MDP whose dynamics are not known to the players, to propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm.

preprint2020arXiv

Sequential decomposition of discrete memoryless channel with noisy feedback

In this paper, we consider a discrete memoryless point to point channel with noisy feedback, where there is a sender with a private message that she wants to communicate to a receiver by sequentially transmitting symbols over a noisy channel. After each transmission, she receives a noisy feedback of the symbol received by the receiver. The goal is to design transmission control strategy of the sender that minimize the average probability of error. This is an instance of decentralized control of information where the two controllers, the sender and the receiver have no common information. There exist no methodology in the literature that provides a notion of "state" and a dynamic program to find optimal policies for this problem In this paper, we show introduce a notion of state, based on which we provide a sequential decomposition methodology that finds optimum policies within the class of Markov strategies with respect to this state (which need not be globally optimum). This allows to decompose the problem across time and reduce the complexity dependence on time from double exponential to linear in time.

preprint2018arXiv

Decentralized Bayesian learning in dynamic games: A framework for studying informational cascades

We study the problem of Bayesian learning in a dynamical system involving strategic agents with asymmetric information. In a series of seminal papers in the literature, this problem has been investigated under a simplifying model where myopically selfish players appear sequentially and act once in the game, based on private noisy observations of the system state and public observation of past players' actions. It has been shown that there exist information cascades where users discard their private information and mimic the action of their predecessor. In this paper, we provide a framework for studying Bayesian learning dynamics in a more general setting than the one described above. In particular, our model incorporates cases where players are non-myopic and strategically participate for the whole duration of the game, and cases where an endogenous process selects which subset of players will act at each time instance. The proposed framework hinges on a sequential decomposition methodology for finding structured perfect Bayesian equilibria (PBE) of a general class of dynamic games with asymmetric information, where user-specific states evolve as conditionally independent Markov processes and users make independent noisy observations of their states. Using this methodology, we study a specific dynamic learning model where players make decisions about public investment based on their estimates of everyone's types. We characterize a set of informational cascades for this problem where learning stops for the team as a whole. We show that in such cascades, all players' estimates of other players' types freeze even though each individual player asymptotically learns its own true type.