Source author record

Srinivas Shakkottai

Srinivas Shakkottai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Networking and Internet Architecture Computer Science and Game Theory math.OC Artificial Intelligence eess.SY Multimedia Social and Information Networks Systems and Control eess.IV Information Theory math.IT Multiagent Systems Performance physics.soc-ph

Catalog footprint

What is connected

13works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.

preprint2022arXiv

OpenGridGym: An Open-Source AI-Friendly Toolkit for Distribution Market Simulation

This paper presents OpenGridGym, an open-source Python-based package that allows for seamless integration of distribution market simulation with state-of-the-art artificial intelligence (AI) decision-making algorithms. We present the architecture and design choice for the proposed framework, elaborate on how users interact with OpenGridGym, and highlight its value by providing multiple cases to demonstrate its use. Four modules are used in any simulation: (1) the physical grid, (2) market mechanisms, (3) a set of trainable agents which interact with the former two modules, and (4) environment module that connects and coordinates the above three. We provide templates for each of those four, but they are easily interchangeable with custom alternatives. Several case studies are presented to illustrate the capability and potential of this toolkit in helping researchers address key design and operational questions in distribution electricity markets.

preprint2022arXiv

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance.

preprint2021arXiv

Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy -- both objective maximization and constraint satisfaction -- in a PAC sense. We explore two classes of RL algorithms, namely, (i) a generative model based approach, wherein samples are taken initially to estimate a model, and (ii) an online approach, wherein the model is updated as samples are obtained. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.

preprint2021arXiv

Reinforcement Learning for Mean Field Games with Strategic Complementarities

Mean Field Games (MFG) are the class of games with a very large number of agents and the standard equilibrium concept is a Mean Field Equilibrium (MFE). Algorithms for learning MFE in dynamic MFGs are unknown in general. Our focus is on an important subclass that possess a monotonicity property called Strategic Complementarities (MFG-SC). We introduce a natural refinement to the equilibrium concept that we call Trembling-Hand-Perfect MFE (T-MFE), which allows agents to employ a measure of randomization while accounting for the impact of such randomization on their payoffs. We propose a simple algorithm for computing T-MFE under a known model. We also introduce a model-free and a model-based approach to learning T-MFE and provide sample complexities of both algorithms. We also develop a fully online learning scheme that obviates the need for a simulator. Finally, we empirically evaluate the performance of the proposed algorithms via examples motivated by real-world applications.

preprint2020arXiv

Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms

Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between the hits achieved by a candidate caching algorithm with respect to a genie-aided scheme that places the most popular items in the cache. We first consider the Full Observation regime wherein all requests are seen by the cache. We show that the Least Frequently Used (LFU) algorithm is able to achieve order optimal regret, which is matched by an efficient counting algorithm design that we call LFU-Lite. We then consider the Partial Observation regime wherein only requests for items currently cached are seen by the cache, making it similar to an online learning problem related to the multi-armed bandit problem. We show how approaching this "caching bandit" using traditional approaches yields either high complexity or regret, but a simple algorithm design that exploits the structure of the distribution can ensure order optimal regret. We conclude by illustrating our insights using numerical simulations.

preprint2020arXiv

QFlow: A Learning Approach to High QoE Video Streaming at the Wireless Edge

The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adoption, and this in turn implies that agile control policies can be now instantiated on access networks. The goal of this work is to design, develop and demonstrate QFlow, a learning approach to create a value chain from the application on one side, to algorithms operating over reconfigurable infrastructure on the other, so that applications are able to obtain necessary resources for optimal performance. Using YouTube video streaming as an example, we illustrate how QFlow is able to adaptively provide such resources and attain a high QoE for all clients at a wireless access point.

preprint2016arXiv

Incentivizing Sharing in Realtime D2D Streaming Networks: A Mean Field Game Perspective

We consider the problem of streaming live content to a cluster of co-located wireless devices that have both an expensive unicast base-station-to-device (B2D) interface, as well as an inexpensive broadcast device-to-device (D2D) interface, which can be used simultaneously. Our setting is a streaming system that uses a block-by-block random linear coding approach to achieve a target percentage of on-time deliveries with minimal B2D usage. Our goal is to design an incentive framework that would promote such cooperation across devices, while ensuring good quality of service. Based on ideas drawn from truth-telling auctions, we design a mechanism that achieves this goal via appropriate transfers (monetary payments or rebates) in a setting with a large number of devices, and with peer arrivals and departures. Here, we show that a Mean Field Game can be used to accurately approximate our system. Furthermore, the complexity of calculating the best responses under this regime is low. We implement the proposed system on an Android testbed, and illustrate its efficient performance using real world experiments.

preprint2013arXiv

Opportunities for Network Coding: To Wait or Not to Wait

It has been well established that wireless network coding can significantly improve the efficiency of multi-hop wireless networks. However, in a stochastic environment some of the packets might not have coding pairs, which limits the number of available coding opportunities. In this context, an important decision is whether to delay packet transmission in hope that a coding pair will be available in the future or transmit a packet without coding. The paper addresses this problem by formulating a stochastic dynamic program whose objective is to minimize the long-run average cost per unit time incurred due to transmissions and delays. In particular, we identify optimal control actions that would balance between costs of transmission against the costs incurred due to the delays. Moreover, we seek to address a crucial question: what should be observed as the state of the system? We analytically show that observing queue lengths suffices if the system can be modeled as a Markov decision process. We also show that a stationary threshold type policy based on queue lengths is optimal. We further substantiate our results with simulation experiments for more generalized settings.

preprint2012arXiv

Incentives for P2P-Assisted Content Distribution: If You Can't Beat 'Em, Join 'Em

The rapid growth of content distribution on the Internet has brought with it proportional increases in the costs of distributing content. Adding to distribution costs is the fact that digital content is easily duplicable, and hence can be shared in an illicit peer-to-peer (P2P) manner that generates no revenue for the content provider. In this paper, we study whether the content provider can recover lost revenue through a more innovative approach to distribution. In particular, we evaluate the benefits of a hybrid revenue-sharing system that combines a legitimate P2P swarm and a centralized client-server approach. We show how the revenue recovered by the content provider using a server-supported legitimate P2P swarm can exceed that of the monopolistic scheme by an order of magnitude. Our analytical results are obtained in a fluid model, and supported by stochastic simulations.

preprint2010arXiv

Access-Network Association Policies for Media Streaming in Heterogeneous Environments

We study the design of media streaming applications in the presence of multiple heterogeneous wireless access methods with different throughputs and costs. Our objective is to analytically characterize the trade-off between the usage cost and the Quality of user Experience (QoE), which is represented by the probability of interruption in media playback and the initial waiting time. We model each access network as a server that provides packets to the user according to a Poisson process with a certain rate and cost. Blocks are coded using random linear codes to alleviate the duplicate packet reception problem. Users must take decisions on how many packets to buffer before playout, and which networks to access during playout. We design, analyze and compare several control policies with a threshold structure. We formulate the problem of finding the optimal control policy as an MDP with a probabilistic constraint. We present the HJB equation for this problem by expanding the state space, and exploit it as a verification method for optimality of the proposed control law.

preprint2010arXiv

Avoiding Interruptions - QoE Trade-offs in Block-coded Streaming Media Applications

We take an analytical approach to study Quality of user Experience (QoE) for video streaming applications. First, we show that random linear network coding applied to blocks of video frames can significantly simplify the packet requests at the network layer and save resources by avoiding duplicate packet reception. Network coding allows us to model the receiver's buffer as a queue with Poisson arrivals and deterministic departures. We consider the probability of interruption in video playback as well as the number of initially buffered packets (initial waiting time) as the QoE metrics. We characterize the optimal trade-off between these metrics by providing upper and lower bounds on the minimum initial buffer size, required to achieve certain level of interruption probability for different regimes of the system parameters. Our bounds are asymptotically tight as the file size goes to infinity.

preprint2010arXiv

Evolution of the Internet AS-Level Ecosystem

We present an analytically tractable model of Internet evolution at the level of Autonomous Systems (ASs). We call our model the multiclass preferential attachment (MPA) model. As its name suggests, it is based on preferential attachment. All of its parameters are measurable from available Internet topology data. Given the estimated values of these parameters, our analytic results predict a definitive set of statistics characterizing the AS topology structure. These statistics are not part of the model formulation. The MPA model thus closes the "measure-model-validate-predict" loop, and provides further evidence that preferential attachment is a driving force behind Internet evolution.

Srinivas Shakkottai

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

OpenGridGym: An Open-Source AI-Friendly Toolkit for Distribution Market Simulation

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

Reinforcement Learning for Mean Field Games with Strategic Complementarities

Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms

QFlow: A Learning Approach to High QoE Video Streaming at the Wireless Edge

Incentivizing Sharing in Realtime D2D Streaming Networks: A Mean Field Game Perspective

Opportunities for Network Coding: To Wait or Not to Wait

Incentives for P2P-Assisted Content Distribution: If You Can't Beat 'Em, Join 'Em

Access-Network Association Policies for Media Streaming in Heterogeneous Environments

Avoiding Interruptions - QoE Trade-offs in Block-coded Streaming Media Applications

Evolution of the Internet AS-Level Ecosystem