Source author record

Arnob Ghosh

Arnob Ghosh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Systems and Control eess.SY Machine Learning math.OC Networking and Internet Architecture Artificial Intelligence Multiagent Systems Information Theory math.IT Neural and Evolutionary Computing Social and Information Networks

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret and $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to $T$.

preprint2022arXiv

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objective to maximize the received signal strength under interference constraint. In particular, we employ the non-stationary kernelized bandit to leverage the correlation among beams and model the complex beamforming and multipath channel functions. Furthermore, to mitigate interference to other user equipment, we leverage the primal-dual method to design a constrained UCB-type kernelized bandit algorithm. Our theoretical analysis indicates that the proposed algorithm can adaptively adjust the beam in time-varying environments, such that both the cumulative regret of the received signal and constraint violations have sublinear bounds with respect to time. This result is of independent interest for applications such as adaptive pricing and news ranking. In addition, the algorithm assumes the channel is a black-box function and does not require any prior knowledge for dynamic channel modeling, and thus is applicable in a variety of scenarios. We further show that if the information about the channel variation is known, the algorithm will have better theoretical guarantees and performance. Finally, we conduct simulations to highlight the effectiveness of the proposed algorithm.

preprint2022arXiv

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

We consider a multi-agent episodic MDP setup where an agent (leader) takes action at each step of the episode followed by another agent (follower). The state evolution and rewards depend on the joint action pair of the leader and the follower. Such type of interactions can find applications in many domains such as smart grids, mechanism design, security, and policymaking. We are interested in how to learn policies for both the players with provable performance guarantee under a bandit feedback setting. We focus on a setup where both the leader and followers are {\em non-myopic}, i.e., they both seek to maximize their rewards over the entire episode and consider a linear MDP which can model continuous state-space which is very common in many RL applications. We propose a {\em model-free} RL algorithm and show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret bounds can be achieved for both the leader and the follower, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps under the bandit feedback information setup. Thus, our result holds even when the number of states becomes infinite. The algorithm relies on {\em novel} adaptation of the LSVI-UCB algorithm. Specifically, we replace the standard greedy policy (as the best response) with the soft-max policy for both the leader and the follower. This turns out to be key in establishing uniform concentration bound for the value functions. To the best of our knowledge, this is the first sub-linear regret bound guarantee for the Markov games with non-myopic followers with function approximation.

preprint2021arXiv

Design of Incentive Mechanisms Using Prospect Theory to Promote Better Sell-back Behavior among Prosumers

Users can now give back energies to the grid using distributed resources. Proper incentive mechanisms are required for such users, also known as prosumers, in order to maximize the sell-back amount while maintaining the retailer's profit. However, all the existing literature considers expected utility theory (EUT) where they assume that prosumers maximize their expected payoff. We consider prospect theory (PT) which models the behavior of humans in the face of uncertainty in a better manner. We show that in a day-ahead contract pricing mechanism, the actual optimal value of contract and the sell-back amount may be smaller compared to the one computed by the EUT. We also propose a lottery-based mechanism and show that such a mechanism can increase the sell-back amount while increasing the retailer's savings compared to day-ahead contract pricing.

preprint2020arXiv

Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents

We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. Finally, we propose Reinforcement learning (RL) based algorithm using policy gradient approach to find the stationary MMFE when the agents are unaware of the dynamics. We, numerically, evaluate how such kind of interaction can model the cyber attacks among defenders and adversaries, and show how RL based algorithm can converge to an equilibrium.

preprint2019arXiv

DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning

The success of modern ride-sharing platforms crucially depends on the profit of the ride-sharing fleet operating companies, and how efficiently the resources are managed. Further, ride-sharing allows sharing costs and, hence, reduces the congestion and emission by making better use of vehicle capacities. In this work, we develop a distributed model-free, DeepPool, that uses deep Q-network (DQN) techniques to learn optimal dispatch policies by interacting with the environment. Further, DeepPool efficiently incorporates travel demand statistics and deep learning models to manage dispatching vehicles for improved ride sharing services. Using real-world dataset of taxi trip records in New York City, DeepPool performs better than other strategies, proposed in the literature, that do not consider ride sharing or do not dispatch the vehicles to regions where the future demand is anticipated. Finally, DeepPool can adapt rapidly to dynamic environments since it is implemented in a distributed manner in which each vehicle solves its own DQN individually without coordination.

preprint2016arXiv

Menu-Based Pricing for Charging of Electric Vehicles with Vehicle-to-Grid Service

The paper considers a bidirectional power flow model of the electric vehicles (EVs) in a charging station. The EVs can inject energies by discharging via a Vehicle-to-Grid (V2G) service which can enhance the profits of the charging station. However, frequent charging and discharging degrade battery life. A proper compensation needs to be paid to the users to participate in the V2G service. We propose a menu-based pricing scheme, where the charging station selects a price for each arriving user for the amount of battery utilization, the total energy, and the time (deadline) that the EV will stay. The user can accept one of the contracts or rejects all depending on their utilities. The charging station can serve users using a combination of the renewable energy and the conventional energy bought from the grid. We show that though there exists a profit maximizing price which maximizes the social welfare, it provides no surplus to the users if the charging station is aware of the utilities of the users. If the charging station is not aware of the exact utilities, the social welfare maximizing price may not maximize the expected profit. In fact, it can give a zero profit. We propose a pricing strategy which provides a guaranteed fixed profit to the charging station and it also maximizes the expected profit for a wide range of utility functions. Our analysis shows that when the harvested renewable energy is small the users have higher incentives for the V2G service. We, numerically, show that the charging station's profit and the user's surplus both increase as V2G service is efficiently utilized by the pricing mechanism.

preprint2016arXiv

Strategic Interaction Among Different Entities in Internet of Things

The economic model of the Internet of Things (IoT) consists of end users, advertisers and three different kinds of providers--IoT service provider (IoTSP), Wireless service provider (WSP) and cloud service provider (CSP). We investigate three different kinds of interactions among the providers. First, we consider that the IoTSP prices a bundled service to the end-users, and the WSP and CSP pay the IoTSP (push model). Next, we consider the model where the end-users independently pay the each provider (pull model). Finally, we consider a hybrid model of the above two where the IoTSP and WSP quote their prices to the end-users, but the CSP quotes its price to the IoTSP. We characterize and quantify the impact of the advertisement revenue on the equilibrium pricing strategy and payoff of providers, and corresponding demands of end users in each of the above interaction models. Our analysis reveals that the demand of end-users, and the payoffs of the providers are non decreasing functions of the advertisement revenue. For sufficiently high advertisement revenue, the IoTSP will offer its service free of cost in each interaction model. However, the payoffs of the providers, and the demand of end-users vary across different interaction models. Our analysis shows that the demand of end-users, and the payoff of the WSP are the highest in the pull (push, resp.) model in the low (high, resp.) advertisement revenue regime. The payoff of the IoTSP is always higher in the pull model irrespective of the advertisement revenue. The payoff of the CSP is the highest in the hybrid model in the low advertisement revenue regime. However, in the high advertisement revenue regime the payoff of the CSP in the hybrid model or in the push model can be higher depending on the equilibrium chosen in the push model.

preprint2016arXiv

The value of Side Information in Secondary Spectrum Markets

In a secondary spectrum market primaries set prices for their unused channels to the secondaries. The payoff of a primary depends on the availability of unused channels of its competitors. We consider a model were a primary can acquire its competitor's channel state information (C-CSI) at a cost. We formulate a game between two primaries where each primary decides whether to acquire C-CSI or not and then selects its price based on that. We first characterize the Nash Equilibrium (NE) of this game for a symmetric model where the C-CSI is perfect. We show that the payoff of a primary is independent of the C-CSI acquisition cost. We then generalize our analysis to allow for imperfect estimation and cases where the two primaries have different C-CSI costs or different channel availabilities. Our results show interestingly that the payoff of a primary increases when there is estimation error. We also show that surprisingly, the expected payoff of a primary may decrease when the C-CSI acquisition cost decreases when primaries have different availabilities.

preprint2015arXiv

Quality Sensitive Price Competition in Spectrum Oligopoly: Part II

We investigate a spectrum oligopoly market where each primary seeks to sell secondary access to its channel at multiple locations. Transmission qualities of a channel evolve randomly. Each primary needs to select a price and a set of non-interfering locations (which is an independent set in the conflict graph of the region) at which to offer its channel without knowing the transmission qualities of the channels of its competitors. We formulate the above problem as a non-cooperative game. We consider two scenarios-i) when the region is small, ii) when the region is large. In the first setting, we focus on a class of conflict graphs, known as mean valid graphs which commonly arise when the region is small. We explicitly compute a symmetric Nash equilibrium (NE); the NE is threshold type in that primaries only choose independent set whose cardinality is greater than a certain threshold. The threshold on the cardinality increases with increase in quality of the channel on sale. We show that the symmetric NE strategy profile is unique in a special class of conflict graphs (linear graph). In the second setting, we consider node symmetric conflict graphs which arises when the number of locations is large (potentially, infinite). We explicitly compute a symmetric NE that randomizes equally among the maximum independent sets at a given channel state vector. In the NE a primary only selects the maximum independent set at a given channel state vector. We show that the two symmetric NEs computed in two settings exhibit important structural difference.

preprint2015arXiv

Quality Sensitive Price Competition in Spectrum Oligopoly:Part 1

We investigate a spectrum oligopoly market where primaries lease their channels to secondaries in lieu of financial remuneration. Transmission quality of a channel evolves randomly. Each primary has to select the price it would quote without knowing the transmission qualities of its competitors' channels. Each secondary buys a channel depending on the price and the transmission quality a channel offers. We formulate the price selection problem as a non co-operative game with primaries as players. In the one-shot game, we show that there exists a unique symmetric Nash Equilibrium(NE) strategy profile and explicitly compute it. Our analysis reveals that under the NE strategy profile a primary prices its channel to render high quality channel more preferable to the secondary; this negates the popular belief that prices ought to be selected to render channels equally preferable to the secondary regardless of their qualities. We show the loss of revenue in the asymptotic limit due to the non co-operation of primaries. In the repeated version of the game, we characterize a subgame perfect NE where a primary can attain a payoff arbitrarily close to the payoff it would obtain when primaries co-operate.

Arnob Ghosh

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

Design of Incentive Mechanisms Using Prospect Theory to Promote Better Sell-back Behavior among Prosumers

Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents

DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning

Menu-Based Pricing for Charging of Electric Vehicles with Vehicle-to-Grid Service

Strategic Interaction Among Different Entities in Internet of Things

The value of Side Information in Secondary Spectrum Markets

Quality Sensitive Price Competition in Spectrum Oligopoly: Part II

Quality Sensitive Price Competition in Spectrum Oligopoly:Part 1