Researcher profile

Rajesh K Mishra

Rajesh K Mishra contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Master equation of discrete time graphon mean field games and teams

In this paper, we present a sequential decomposition algorithm equivalent of Master equation to compute GMFE of GMFG and graphon optimal Markovian policies (GOMPs) of graphon mean field teams (GMFTs). We consider a large population of players sequentially making strategic decisions where the actions of each player affect their neighbors which is captured in a graph, generated by a known graphon. Each player observes a private state and also a common information as a graphon mean-field population state which represents the empirical networked distribution of other players' types. We consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute both GMFE and GOMP that depend on both, a player's private type, and the current (dynamic) population state determined through the graphon. Each step in computing GMFE consists of solving a fixed-point equation, while computing GOMP involves solving for an optimization problem. We provide conditions on model parameters for which there exists such a GMFE. Using this algorithm, we obtain the GMFE and GOMP for a specific security setup in cyber physical systems for different graphons that capture the interactions between the nodes in the system.

preprint2020arXiv

Decentralized multi-agent reinforcement learning with shared actions

In this paper, we propose a novel model-free reinforcement learning algorithm to compute the optimal policies for a multi-agent system with $N$ cooperative agents where each agent privately observes it's own private type and publicly observes each others' actions. The goal is to maximize their collective reward. The problem belongs to the broad class of decentralized control problems with partial information. We use the common agent approach wherein some fictitious common agent picks the best policy based on a belief on the current states of the agents. These beliefs are updated individually for each agent from their current belief and action histories. Belief state updates without the knowledge of system dynamics is a challenge. In this paper, we employ particle filters called the bootstrap filter distributively across agents to update the belief. We provide a model-free reinforcement learning (RL) method for this multi-agent partially observable Markov decision processes using the particle filter and sampled trajectories to estimate the optimal policies for the agents. We showcase our results with the help of a smartgrid application where the users strive to reduce collective cost of power for all the agents in the grid. Finally, we compare the performances for model and model-free implementation of the RL algorithm establishing the effectiveness of particle filter (pf) method.

preprint2020arXiv

Model-free Reinforcement Learning for Non-stationary Mean Field Games

In this paper, we consider a finite horizon, non-stationary, mean field games (MFG) with a large population of homogeneous players, sequentially making strategic decisions, where each player is affected by other players through an aggregate population state termed as mean field state. Each player has a private type that only it can observe, and a mean field population state representing the empirical distribution of other players' types, which is shared among all of them. Recently, authors in [1] provided a sequential decomposition algorithm to compute mean field equilibrium (MFE) for such games which allows for the computation of equilibrium policies for them in linear time than exponential, as before. In this paper, we extend it for the case when state transitions are not known, to propose a reinforcement learning algorithm based on Expected Sarsa with a policy gradient approach that learns the MFE policy by learning the dynamics of the game simultaneously. We illustrate our results using cyber-physical security example.

preprint2020arXiv

Model-free Reinforcement Learning for Stochastic Stackelberg Security Games

In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy which maximizes its own returns given the knowledge that the follower is going to play the best response to its policy. Thus, both players converge to a pair of policies that form the Stackelberg equilibrium of the game. Recently,~[1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend the idea to an MDP whose dynamics are not known to the players, to propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm.