Source author record

Avinash Mohan

Avinash Mohan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Systems and Control Artificial Intelligence Machine Learning Networking and Internet Architecture Performance

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. Both algorithms operate over a class of improper mixtures of the given controllers. For the first case, we derive convergence rate guarantees assuming access to a gradient oracle. For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case. Numerical results on (i) the standard control theoretic benchmark of stabilizing an cartpole; and (ii) a constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable.

preprint2020arXiv

On the Volatility of Optimal Control Policies and the Capacity of a Class of Linear Quadratic Regulators

It is well known that highly volatile control laws, while theoretically optimal for certain systems, are undesirable from an engineering perspective, being generally deleterious to the controlled system. In this article we are concerned with the temporal volatility of the control process of the regulator in discrete time Linear Quadratic Regulators (LQRs). Our investigation in this paper unearths a surprising connection between the cost functional which an LQR is tasked with minimizing and the temporal variations of its control laws. We first show that optimally controlling the system always implies high levels of control volatility, i.e., it is impossible to reduce volatility in the optimal control process without sacrificing cost. We also show that, akin to communication systems, every LQR has a $Capacity~Region$ associated with it, that dictates and quantifies how much cost is achievable at a given level of control volatility. This additionally establishes the fact that no admissible control policy can simultaneously achieve low volatility and low cost. We then employ this analysis to explain the phenomenon of temporal price volatility frequently observed in deregulated electricity markets.

preprint2020arXiv

Throughput Optimal Decentralized Scheduling with Single-bit State Feedback for a Class of Queueing Systems

Motivated by medium access control for resource-challenged wireless Internet of Things (IoT), we consider the problem of queue scheduling with reduced queue state information. In particular, we consider a time-slotted scheduling model with $N$ sensor nodes, with pair-wise dependence, such that Nodes $i$ and $i + 1,~0 < i < N$ cannot transmit together. We develop new throughput-optimal scheduling policies requiring only the empty-nonempty state of each queue that we term Queue Nonemptiness-Based (QNB) policies. We propose a Policy Splicing technique to combine scheduling policies for small networks in order to construct throughput-optimal policies for larger networks, some of which also aim for low delay. For $N = 3,$ there exists a sum-queue length optimal QNB scheduling policy. We show, however, that for $N > 4,$ there is no QNB policy that is sum-queue length optimal over all arrival rate vectors in the capacity region. We then extend our results to a more general class of interference constraints that we call cluster-of-cliques (CoC) conflict graphs. We consider two types of CoC networks, namely, Linear Arrays of Cliques (LAoC) and Star-of-Cliques (SoC) networks. We develop QNB policies for these classes of networks, study their stability and delay properties, and propose and analyze techniques to reduce the amount of state information to be disseminated across the network for scheduling. In the SoC setting, we propose a throughput-optimal policy that only uses information that nodes in the network can glean by sensing activity (or lack thereof) on the channel. Our throughput-optimality results rely on two new arguments: a Lyapunov drift lemma specially adapted to policies that are queue length-agnostic, and a priority queueing analysis for showing strong stability.

Avinash Mohan

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Actor-Critic based Improper Reinforcement Learning

On the Volatility of Optimal Control Policies and the Capacity of a Class of Linear Quadratic Regulators

Throughput Optimal Decentralized Scheduling with Single-bit State Feedback for a Class of Queueing Systems