Source author record

Abishek Sankararaman

Abishek Sankararaman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT math.PR Social and Information Networks Artificial Intelligence Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits

We consider the problem of minimizing regret in an $N$ agent heterogeneous stochastic linear bandits framework, where the agents (users) are similar but not all identical. We model user heterogeneity using two popularly used ideas in practice; (i) A clustering framework where users are partitioned into groups with users in the same group being identical to each other, but different across groups, and (ii) a personalization framework where no two users are necessarily identical, but a user's parameters are close to that of the population average. In the clustered users' setup, we propose a novel algorithm, based on successive refinement of cluster identities and regret minimization. We show that, for any agent, the regret scales as $\mathcal{O}(\sqrt{T/N})$, if the agent is in a `well separated' cluster, or scales as $\mathcal{O}(T^{\frac{1}{2} + \varepsilon}/(N)^{\frac{1}{2} -\varepsilon})$ if its cluster is not well separated, where $\varepsilon$ is positive and arbitrarily close to $0$. Our algorithm is adaptive to the cluster separation, and is parameter free -- it does not need to know the number of clusters, separation and cluster size, yet the regret guarantee adapts to the inherent complexity. In the personalization framework, we introduce a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model. We show that, an agent $i$ whose parameter deviates from the population average by $ε_i$, attains a regret scaling of $\widetilde{O}(ε_i\sqrt{T})$. This demonstrates that if the user representations are close (small $ε_i)$, the resulting regret is low, and vice-versa. The results are empirically validated and we observe superior performance of our adaptive algorithms over non-adaptive baselines.

preprint2022arXiv

Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits

We prove an instance independent (poly) logarithmic regret for stochastic contextual bandits with linear payoff. Previously, in \cite{chu2011contextual}, a lower bound of $\mathcal{O}(\sqrt{T})$ is shown for the contextual linear bandit problem with arbitrary (adversarily chosen) contexts. In this paper, we show that stochastic contexts indeed help to reduce the regret from $\sqrt{T}$ to $\polylog(T)$. We propose Low Regret Stochastic Contextual Bandits (\texttt{LR-SCB}), which takes advantage of the stochastic contexts and performs parameter estimation (in $\ell_2$ norm) and regret minimization simultaneously. \texttt{LR-SCB} works in epochs, where the parameter estimation of the previous epoch is used to reduce the regret of the current epoch. The (poly) logarithmic regret of \texttt{LR-SCB} stems from two crucial facts: (a) the application of a norm adaptive algorithm to exploit the parameter estimation and (b) an analysis of the shifted linear contextual bandit algorithm, showing that shifting results in increasing regret. We have also shown experimentally that stochastic contexts indeed incurs a regret that scales with $\polylog(T)$.

preprint2022arXiv

Decentralized Competing Bandits in Non-Stationary Matching Markets

Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side agents have unknown and different preferences over the supply-side (arms), but the arms have fixed and known preference over the agents. We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms. The complexity in understanding such a system stems from the fact that the competing bandits choose their actions in an asynchronous fashion, and the lower ranked agents only get to learn from a set of arms, not \emph{dominated} by the higher ranked agents, which leads to \emph{forced exploration}. With carefully defined complexity parameters, we characterize this \emph{forced exploration} and obtain sub-linear (logarithmic) regret of \texttt{DNCB}. Furthermore, we validate our theoretical findings via experiments.

preprint2022arXiv

Multi-Agent Low-Dimensional Linear Bandits

We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector $θ^* \in \mathbb{R}^d$. The side information consists of a finite collection of low-dimensional subspaces, one of which contains $θ^*$. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other and each agent plays a projected variant of LinUCB on the corresponding (low-dimensional) subspace. By distributing the search for the optimal subspace across users and learning of the unknown vector by each agent in the corresponding low-dimensional subspace, we show that the per-agent finite-time regret is much smaller than the case when agents do not communicate. We finally complement these results through simulations.

preprint2020arXiv

Community Detection on Euclidean Random Graphs

We study the problem of community detection (CD) on Euclidean random geometric graphs where each vertex has two latent variables: a binary community label and a $\mathbb{R}^d$ valued location label which forms the support of a Poisson point process of intensity $λ$. A random graph is then drawn with edge probabilities dependent on both the community and location labels. In contrast to the stochastic block model (SBM) that has no location labels, the resulting random graph contains many more short loops due to the geometric embedding. We consider the recovery of the community labels, partial and exact, using the random graph and the location labels. We establish phase transitions for both sparse and logarithmic degree regimes, and provide bounds on the location of the thresholds, conjectured to be tight in the case of exact recovery. We also show that the threshold of the distinguishability problem, i.e., the testing between our model and the null model without community labels exhibits no phase-transition and in particular, does not match the weak recovery threshold (in contrast to the SBM).

preprint2020arXiv

Ergodicity and steady state analysis for Interference Queueing Networks

We analyze an interacting queueing network on $\mathbb{Z}^d$ that was introduced in Sankararaman-Baccelli-Foss (2019) as a model for wireless networks. We show that the marginals of the minimal stationary distribution have exponential tails. This is used to furnish asymptotics for the maximum steady state queue length in growing boxes around the origin. We also establish a decay of correlations which shows that the minimal stationary distribution is strongly mixing, and hence, ergodic with respect to translations on $\mathbb{Z}^d$.

preprint2020arXiv

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i \in [K]$, is $μ_i+ \langle α_{i,t},θ^* \rangle $, with $α_{i,t} \in \mathbb{R}^d$ being the known context vector and $μ_i \in [-1,1]$ and $θ^*$ are unknown parameters. We define $\|θ^*\|$ as the problem complexity and consider a sequence of nested hypothesis classes, each positing a different upper bound on $\|θ^*\|$. Exploiting this, we propose Adaptive Linear Bandit (ALB), a novel phase based algorithm that adapts to the true problem complexity, $\|θ^*\|$. We show that ALB achieves regret scaling of $O(\|θ^*\|\sqrt{T})$, where $\|θ^*\|$ is apriori unknown. As a corollary, when $θ^*=0$, ALB recovers the minimax regret for the simple bandit algorithm without such knowledge of $θ^*$. ALB is the first algorithm that uses parameter norm as model section criteria for linear bandits. Prior state of art algorithms \cite{osom} achieve a regret of $O(L\sqrt{T})$, where $L$ is the upper bound on $\|θ^*\|$, fed as an input to the problem. In the second setting, we consider the standard linear bandit problem (with possibly an infinite number of arms) where the sparsity of $θ^*$, denoted by $d^* \leq d$, is unknown to the algorithm. Defining $d^*$ as the problem complexity, we show that ALB achieves $O(d^*\sqrt{T})$ regret, matching that of an oracle who knew the true sparsity level. This methodology is then extended to the case of finitely many arms and similar results are proven. This is the first algorithm that achieves such model selection guarantees. We further verify our results via synthetic and real-data experiments.

preprint2016arXiv

Performance-Oriented Association in Large Cellular Networks with Technology Diversity

The development of mobile virtual network operators, where multiple wireless technologies (e.g. 3G and 4G) or operators with non-overlapping bandwidths are pooled and shared is expected to provide enhanced service with broader coverage, without incurring additional infrastructure cost. However, their emergence poses an unsolved question on how to harness such a technology and bandwidth diversity. This paper addresses one of the simplest questions in this class, namely, the issue of associating each mobile to one of those bandwidths. Intriguingly, this association issue is intrinsically distinct from those in traditional networks. We first propose a generic stochastic geometry model lending itself to analyzing a wide class of association policies exploiting various information on the network topology, e.g. received pilot powers and fading values. This model firstly paves the way for tailoring and designing an optimal association scheme to maximize any performance metric of interest (e.g. the probability of coverage) subject to the information known about the network. In this class of optimal association, we prove a result that the performance improves as the information known about the network increases. Secondly, this model is used to quantify the performance of any arbitrary association policy and not just the optimal association policy. We propose a simple policy called the Max-Ratio which is not-parametric, i.e. it dispenses with the statistical knowledge of base station deployments in stochastic geometry models. We also prove that this simple policy is optimal in a certain limiting regime of the wireless environment. Through simulations, we provide insights into (i) a practical compromise between performance gain and cost of estimating information and; (ii) the selection of association schemes under environments with different propagation models, i.e. path-loss exponents.

Abishek Sankararaman

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits

Breaking the $\sqrt{T}$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits

Decentralized Competing Bandits in Non-Stationary Matching Markets

Multi-Agent Low-Dimensional Linear Bandits

Community Detection on Euclidean Random Graphs

Ergodicity and steady state analysis for Interference Queueing Networks

Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

Performance-Oriented Association in Large Cellular Networks with Technology Diversity