Source author record

Jhelum Chakravorty

Jhelum Chakravorty appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence eess.SY Information Theory Machine Learning math.IT math.OC Multiagent Systems Systems and Control

Catalog footprint

What is connected

3works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions

Explicit engineering of reward functions for given environments has been a major hindrance to reinforcement learning methods. While Inverse Reinforcement Learning (IRL) is a solution to recover reward functions from demonstrations only, these learned rewards are generally heavily \textit{entangled} with the dynamics of the environment and therefore not portable or \emph{robust} to changing environments. Modern adversarial methods have yielded some success in reducing reward entanglement in the IRL setting. In this work, we leverage one such method, Adversarial Inverse Reinforcement Learning (AIRL), to propose an algorithm that learns hierarchical disentangled rewards with a policy over options. We show that this method has the ability to learn \emph{generalizable} policies and reward functions in complex transfer learning tasks, while yielding results in continuous control benchmarks that are comparable to those of the state-of-the-art methods.

preprint2020arXiv

Option-Critic in Cooperative Multi-agent Systems

In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems, using the options framework (Sutton et al, 1999). First, we address the planning problem for the decentralized POMDP represented by the multi-agent system, by introducing a \emph{common information approach}. We use the notion of \emph{common beliefs} and broadcasting to solve an equivalent centralized POMDP problem. Then, we propose the Distributed Option Critic (DOC) algorithm, which uses centralized option evaluation and decentralized intra-option improvement. We theoretically analyze the asymptotic convergence of DOC and build a new multi-agent environment to demonstrate its validity. Our experiments empirically show that DOC performs competitively against baselines and scales with the number of agents.

preprint2014arXiv

Distortion-transmission trade-off in real-time transmission of Markov sources

The problem of optimal real-time transmission of a Markov source under constraints on the expected number of transmissions is considered, both for the discounted and long term average cases. This setup is motivated by applications where transmission is sporadic and the cost of switching on the radio and transmitting is significantly more important than the size of the transmitted data packet. For this model, we characterize the distortion-transmission function, i.e., the minimum expected distortion that can be achieved when the expected number of transmissions is less than or equal to a particular value. In particular, we show that the distortion-transmission function is a piecewise linear, convex, and decreasing function. We also give an explicit characterization of each vertex of the piecewise linear function. To prove the results, the optimization problem is cast as a decentralized constrained stochastic control problem. We first consider the Lagrange relaxation of the constrained problem and identify the structure of optimal transmission and estimation strategies. In particular, we show that the optimal transmission is of a threshold type. Using these structural results, we obtain dynamic programs for the Lagrange relaxations. We identify the performance of an arbitrary threshold-type transmission strategy and use the idea of calibration from multi-armed bandits to determine the optimal transmission strategy for the Lagrange relaxation. Finally, we show that the optimal strategy for the constrained setup is a randomized strategy that randomizes between two deterministic strategies that differ only at one state. By evaluating the performance of these strategies, we determine the shape of the distortion-transmission function. These results are illustrated using an example of transmitting a birth-death Markov source.