Source author record

Sanjay Shakkottai

Sanjay Shakkottai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks Information Theory math.IT Networking and Internet Architecture math.OC Distributed, Parallel, and Cluster Computing physics.soc-ph Artificial Intelligence Computer Vision Information Retrieval Populations and Evolution Systems and Control

Catalog footprint

What is connected

33works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability

Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, their outputs are biased, and in low-temperature regimes their discretizations can become unstable. We characterize this bias by introducing a tractable surrogate path connecting the true posterior to a standard Gaussian and comparing it to the sampler's path. Their density ratio satisfies a parabolic PDE whose reaction term measures the accumulated bias. A Feynman-Kac representation then expresses the Radon-Nikodym correction as an explicit path expectation, identifying which posterior regions are over- or under-sampled. We apply this framework to DPS and STSL, a related sampler. For DPS, the correction is an Ornstein-Uhlenbeck path expectation coupling the data conditional covariance with the reward curvature, revealing where DPS over- or under-samples. Next, we reinterpret STSL as an auxiliary drift that steers trajectories toward low-uncertainty regions, flattening the spatially varying part of the DPS reaction term. Finally, we characterize early guidance-stopping, a common mitigation for low-temperature instabilities caused by forward-Euler integration of the vector field. Together, these results clarify sampler bias, explain existing correctives, and guide stable variant designs.

preprint2022arXiv

Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task?

Fine-tuning of pre-trained deep nets is commonly used to improve accuracies and training times for neural nets. It is generally assumed that pre-training a net for optimal source task performance best prepares it for fine-tuning to learn an arbitrary target task. This is generally not true. Stopping source task training, prior to optimal performance, can create a pre-trained net better suited for fine-tuning to learn a new task. We perform several experiments demonstrating this effect, as well as the influence of the amount of training and of learning rate. Additionally, our results indicate that this reflects a general loss of learning ability that even extends to relearning the source task.

preprint2022arXiv

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads to a model that generalizes well to new unseen tasks. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear representation setting. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via local updates. We formally establish the iteration complexity required by the clients for proving such result in the setting where the underlying shared representation is a linear map. To the best of our knowledge, this is the first such result for any setting. We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.

preprint2022arXiv

How Does the Task Landscape Affect MAML Performance?

Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard non-adaptive learning (NAL), and little is understood about how much MAML improves over NAL in terms of the fast adaptability of their solutions in various scenarios. We analytically address this issue in a linear regression setting consisting of a mixture of easy and hard tasks, where hardness is related to the rate that gradient descent converges on the task. Specifically, we prove that in order for MAML to achieve substantial gain over NAL, (i) there must be some discrepancy in hardness among the tasks, and (ii) the optimal solutions of the hard tasks must be closely packed with the center far from the center of the easy tasks optimal solutions. We also give numerical and analytical results suggesting that these insights apply to two-layer neural networks. Finally, we provide few-shot image classification experiments that support our insights for when MAML should be used and emphasize the importance of training MAML on hard tasks in practice.

preprint2022arXiv

Linear Bandit Algorithms with Sublinear Time Complexity

We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate MIPS solvers run in sublinear time. We extend those solvers and present theoretical guarantees for online learning problems, where adaptivity (i.e., a later step depends on the feedback in previous steps) becomes a unique challenge. We then explicitly characterize the tradeoff between the per-step complexity and regret. For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\tilde O(\sqrt{T})$ regret. Empirically, we evaluate our proposed algorithms in a synthetic environment and a real-world online movie recommendation problem. Our proposed algorithms can deliver a more than 72 times speedup compared to the linear time baselines while retaining similar regret.

preprint2022arXiv

Multi-Agent Low-Dimensional Linear Bandits

We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector $θ^* \in \mathbb{R}^d$. The side information consists of a finite collection of low-dimensional subspaces, one of which contains $θ^*$. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentralized algorithm, where agents communicate subspace indices with each other and each agent plays a projected variant of LinUCB on the corresponding (low-dimensional) subspace. By distributing the search for the optimal subspace across users and learning of the unknown vector by each agent in the corresponding low-dimensional subspace, we show that the per-agent finite-time regret is much smaller than the case when agents do not communicate. We finally complement these results through simulations.

preprint2022arXiv

PAC Generalization via Invariant Representations

One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant representations might allow us to learn models with out-of-distribution guarantees, i.e., models that are robust to interventions in the SEM. To address the invariant representation problem in a {\em finite sample} setting, we consider the notion of $ε$-approximate invariance. We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds \textit{probabilistically} over a family of linear SEMs without faithfulness assumptions. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes. We also show how to extend our results to a linear indirect observation model that incorporates latent variables.

preprint2022arXiv

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate of $\mathcal{O}\left(\frac{\mathrm{poly}\log(T)}{\sqrt{T}}\right)$ after $T$ iterations under the same assumptions as optimally-tuned non adaptive SGD (unbounded gradient norms and affine noise variance scaling), and crucially, without needing any tuning parameters. We thus establish that adaptive gradient methods exhibit order-optimal convergence in much broader regimes than previously understood.

preprint2021arXiv

Contextual Bandits with Stochastic Experts

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem. In our problem setting, we assume access to a class of stochastic experts, where each expert is a conditional distribution over the arms given a context. We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert. Both these estimators leverage information leakage among the experts, thus using samples collected under all the experts to estimate the mean reward of any given expert. This leads to instance dependent regret bounds of $\mathcal{O}\left(λ(\pmbμ)\mathcal{M}\log T/Δ\right)$, where $λ(\pmbμ)$ is a term that depends on the mean rewards of the experts, $Δ$ is the smallest gap between the mean reward of the optimal expert and the rest, and $\mathcal{M}$ quantifies the information leakage among the experts. We show that under some assumptions $λ(\pmbμ)$ is typically $\mathcal{O}(\log N)$, where $N$ is the number of experts. We implement our algorithm with stochastic experts generated from cost-sensitive classification oracles and show superior empirical performance on real-world datasets, when compared to other state of the art contextual bandit algorithms.

preprint2021arXiv

Improved Algorithms for Misspecified Linear Markov Decision Processes

For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as $K \rightarrow \infty$. (P3) It does not require $\varepsilon_{\text{mis}}$ as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of $\varepsilon_{\text{tol}}$, we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB algorithm, which Takemura et al. [2021] recently showed satisfies (P3) for contextual bandits. We also provide an intuitive interpretation of their result, which informs the design of our algorithm.

preprint2021arXiv

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a certain computation oracle.

preprint2021arXiv

Robust Multi-Agent Multi-Armed Bandits

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include $n$ honest and $m$ malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.

preprint2021arXiv

Stochastic Linear Bandits with Protected Subspace

We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace. In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action. Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space, and (in almost all rounds) plays optimistically with respect to a confidence set for this space. We provide a $\tilde{O}(sd\sqrt{T})$ regret upper bound in the case where the action space is the complete unit ball in $\mathbb{R}^d$, $s < d$ is the dimension of the protected subspace, and $T$ is the time horizon. Moreover, we demonstrate that a discrete action space can lead to linear regret with an optimistic algorithm, reinforcing the sub-optimality of optimism in certain settings. We also show that protection constraints imply that for certain settings, no consistent algorithm can have a regret smaller than $Ω(T^{3/4}).$ We finally empirically validate our results with synthetic and real datasets.

preprint2020arXiv

Contextual Blocking Bandits

We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. The above contextual setting, which captures important scenarios such as recommendation systems or ad placement with diverse users, invalidates greedy solution techniques that are effective for its non-contextual counterpart (Basu et al., NeurIPS19). Assuming knowledge of the context distribution and the mean reward of each arm-context pair, we cast the problem as an online bipartite matching problem, where the right-vertices (contexts) arrive stochastically and the left-vertices (arms) are blocked for a finite number of rounds each time they are matched. This problem has been recently studied in the full-information case, where competitive ratio bounds have been derived. We focus on the bandit setting, where the reward distributions are initially unknown; we propose a UCB-based variant of the full-information algorithm that guarantees a $\mathcal{O}(\log T)$-regret w.r.t. an $α$-optimal strategy in $T$ time steps, matching the $Ω(\log(T))$ regret lower bound in this setting. Due to the time correlations caused by blocking, existing techniques for upper bounding regret fail. For proving our regret bounds, we introduce the novel concepts of delayed exploitation and opportunistic subsampling and combine them with ideas from combinatorial bandits and non-stationary Markov chains coupling.

preprint2020arXiv

Importance Weighted Generative Networks

Deep generative networks can simulate from a complex target distribution, by minimizing a loss with respect to samples from that distribution. However, often we do not have direct access to our target distribution - our data may be subject to sample selection bias, or may be from a different but related distribution. We present methods based on importance weighting that can estimate the loss with respect to a target distribution, even if we cannot access that distribution directly, in a variety of settings. These estimators, which differentially weight the contribution of data to the loss function, offer both theoretical guarantees and impressive empirical performance.

preprint2020arXiv

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. This covariate shift is caused, in part, due to unobserved features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, ${\sf Mix\&Match}$, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations. Finally, we validate our algorithm on two real-world datasets.

preprint2020arXiv

Task-Robust Model-Agnostic Meta-Learning

Meta-learning methods have shown an impressive ability to train models that rapidly learn new tasks. However, these methods only aim to perform well in expectation over tasks coming from some particular distribution that is typically equivalent across meta-training and meta-testing, rather than considering worst-case task performance. In this work we introduce the notion of "task-robustness" by reformulating the popular Model-Agnostic Meta-Learning (MAML) objective [Finn et al. 2017] such that the goal is to minimize the maximum loss over the observed meta-training tasks. The solution to this novel formulation is task-robust in the sense that it places equal importance on even the most difficult and/or rare tasks. This also means that it performs well over all distributions of the observed tasks, making it robust to shifts in the task distribution between meta-training and meta-testing. We present an algorithm to solve the proposed min-max problem, and show that it converges to an $ε$-accurate point at the optimal rate of $\mathcal{O}(1/ε^2)$ in the convex setting and to an $(ε, δ)$-stationary point at the rate of $\mathcal{O}(\max\{1/ε^5, 1/δ^5\})$ in nonconvex settings. We also provide an upper bound on the new task generalization error that captures the advantage of minimizing the worst-case task loss, and demonstrate this advantage in sinusoid regression and image classification experiments.

preprint2016arXiv

Contextual Bandits with Latent Confounders: An NMF Approach

Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ($m \ll L,K$). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $\mathbf{U}$ (for each context in $[L]$ and each arm in $[K]$) factorizes into non-negative factors $\mathbf{A}$ ($L \times m$) and $\mathbf{W}$ ($m \times K$). This insight enables us to propose an $ε$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of $\mathcal{O}\left(L\mathrm{poly}(m, \log K) \log T \right)$ at time $T$, as compared to $\mathcal{O}(LK\log T)$ for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition. We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of $\mathcal{O}\left(Km\log T\right)$. These are the first regret guarantees for online matrix completion with bandit feedback, when the rank is greater than one. We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.

preprint2015arXiv

Detecting Sponsored Recommendations

With a vast number of items, web-pages, and news to choose from, online services and the customers both benefit tremendously from personalized recommender systems. Such systems however provide great opportunities for targeted advertisements, by displaying ads alongside genuine recommendations. We consider a biased recommendation system where such ads are displayed without any tags (disguised as genuine recommendations), rendering them indistinguishable to a single user. We ask whether it is possible for a small subset of collaborating users to detect such a bias. We propose an algorithm that can detect such a bias through statistical analysis on the collaborating users' feedback. The algorithm requires only binary information indicating whether a user was satisfied with each of the recommended item or not. This makes the algorithm widely appealing to real world issues such as identification of search engine bias and pharmaceutical lobbying. We prove that the proposed algorithm detects the bias with high probability for a broad class of recommendation systems when sufficient number of users provide feedback on sufficient number of recommendations. We provide extensive simulations with real data sets and practical recommender systems, which confirm the trade offs in the theoretical guarantees.

preprint2015arXiv

Scheduling Storms and Streams in the Cloud

Motivated by emerging big streaming data processing paradigms (e.g., Twitter Storm, Streaming MapReduce), we investigate the problem of scheduling graphs over a large cluster of servers. Each graph is a job, where nodes represent compute tasks and edges indicate data-flows between these compute tasks. Jobs (graphs) arrive randomly over time, and upon completion, leave the system. When a job arrives, the scheduler needs to partition the graph and distribute it over the servers to satisfy load balancing and cost considerations. Specifically, neighboring compute tasks in the graph that are mapped to different servers incur load on the network; thus a mapping of the jobs among the servers incurs a cost that is proportional to the number of "broken edges". We propose a low complexity randomized scheduling algorithm that, without service preemptions, stabilizes the system with graph arrivals/departures; more importantly, it allows a smooth trade-off between minimizing average partitioning cost and average queue lengths. Interestingly, to avoid service preemptions, our approach does not rely on a Gibbs sampler; instead, we show that the corresponding limiting invariant measure has an interpretation stemming from a loss system.

preprint2015arXiv

Wireless Scheduling with Partial Channel State Information: Large Deviations and Optimality

We consider a server serving a time-slotted queued system of multiple packet-based flows, with exogenous packet arrivals and time-varying service rates. At each time, the server can observe instantaneous service rates for only a subset of flows (from within a fixed collection of observable subsets) before scheduling a flow in the subset for service. We are interested in queue-length aware scheduling to keep the queues short, and develop scheduling algorithms that use only partial service rate information from subsets of channels to minimize the likelihood of queue overflow in the system. Specifically, we present a new joint subset-sampling and scheduling algorithm called Max-Exp that uses only the current queue lengths to pick a subset of flows, and subsequently schedules a flow using the Exponential rule. When the collection of observable subsets is disjoint, we show that Max-Exp achieves the best exponential decay rate, among all scheduling algorithms using partial information, of the tail of the longest queue in the system. Towards this, we employ novel analytical techniques for studying the performance of scheduling algorithms using partial state, which may be of independent interest. These include new sample-path large deviations results for processes obtained by non-random, predictable sampling of sequences of independent and identically distributed random variables. A consequence of these results is that scheduling with partial state information yields a rate function significantly different from scheduling with full channel information. In the special case when the observable subsets are singleton flows, i.e., when there is effectively no a priori channel-state information, Max-Exp reduces to simply serving the flow with the longest queue; thus, our results show that to always serve the longest queue in the absence of any channel-state information is large-deviations optimal.

preprint2014arXiv

Epidemic Spreading with External Agents

We study epidemic spreading processes in large networks, when the spread is assisted by a small number of external agents: infection sources with bounded spreading power, but whose movement is unrestricted vis-à-vis the underlying network topology. For networks which are `spatially constrained', we show that the spread of infection can be significantly speeded up even by a few such external agents infecting randomly. Moreover, for general networks, we derive upper-bounds on the order of the spreading time achieved by certain simple (random/greedy) external-spreading policies. Conversely, for certain common classes of networks such as line graphs, grids and random geometric graphs, we also derive lower bounds on the order of the spreading time over all (potentially network-state aware and adversarial) external-spreading policies; these adversarial lower bounds match (up to logarithmic factors) the spreading time achieved by an external agent with a random spreading policy. This demonstrates that random, state-oblivious infection-spreading by an external agent is in fact order-wise optimal for spreading in such spatially constrained networks.

preprint2014arXiv

Localized epidemic detection in networks with overwhelming noise

We consider the problem of detecting an epidemic in a population where individual diagnoses are extremely noisy. The motivation for this problem is the plethora of examples (influenza strains in humans, or computer viruses in smartphones, etc.) where reliable diagnoses are scarce, but noisy data plentiful. In flu/phone-viruses, exceedingly few infected people/phones are professionally diagnosed (only a small fraction go to a doctor) but less reliable secondary signatures (e.g., people staying home, or greater-than-typical upload activity) are more readily available. These secondary data are often plagued by unreliability: many people with the flu do not stay home, and many people that stay home do not have the flu. This paper identifies the precise regime where knowledge of the contact network enables finding the needle in the haystack: we provide a distributed, efficient and robust algorithm that can correctly identify the existence of a spreading epidemic from highly unreliable local data. Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available. Our results show it succeeds in the presence of partial information about the contact network, and also when there is not a single "patient zero", but rather many (hundreds, in our examples) of initial patient-zeroes, spread across the graph.

preprint2014arXiv

Online Collaborative-Filtering on Graphs

A common phenomena in modern recommendation systems is the use of feedback from one user to infer the `value' of an item to other users. This results in an exploration vs. exploitation trade-off, in which items of possibly low value have to be presented to users in order to ascertain their value. Existing approaches to solving this problem focus on the case where the number of items are small, or admit some underlying structure -- it is unclear, however, if good recommendation is possible when dealing with content-rich settings with unstructured content. We consider this problem under a simple natural model, wherein the number of items and the number of item-views are of the same order, and an `access-graph' constrains which user is allowed to see which item. Our main insight is that the presence of the access-graph in fact makes good recommendation possible -- however this requires the exploration policy to be designed to take advantage of the access-graph. Our results demonstrate the importance of `serendipity' in exploration, and how higher graph-expansion translates to a higher quality of recommendations; it also suggests a reason why in some settings, simple policies like Twitter's `Latest-First' policy achieve a good performance. From a technical perspective, our model presents a way to study exploration-exploitation tradeoffs in settings where the number of `trials' and `strategies' are large (potentially infinite), and more importantly, of the same order. Our algorithms admit competitive-ratio guarantees which hold for the worst-case user, under both finite-population and infinite-horizon settings, and are parametrized in terms of properties of the underlying graph. Conversely, we also demonstrate that improperly-designed policies can be highly sub-optimal, and that in many settings, our results are order-wise optimal.

preprint2014arXiv

Serving Content with Unknown Demand:the High-Dimensional Regime

In this paper we look at content placement in the high-dimensional regime: there are n servers, and O(n) distinct types of content. Each server can store and serve O(1) types at any given time. Demands for these content types arrive, and have to be served in an online fashion; over time, there are a total of O(n) of these demands. We consider the algorithmic task of content placement: determining which types of content should be on which server at any given time, in the setting where the demand statistics (i.e. the relative popularity of each type of content) are not known a-priori, but have to be inferred from the very demands we are trying to satisfy. This is the high- dimensional regime because this scaling (everything being O(n)) prevents consistent estimation of demand statistics; it models many modern settings where large numbers of users, servers and videos/webpages interact in this way. We characterize the performance of any scheme that separates learning and placement (i.e. which use a portion of the demands to gain some estimate of the demand statistics, and then uses the same for the remaining demands), showing it is order-wise strictly suboptimal. We then study a simple adaptive scheme - which myopically attempts to store the most recently requested content on idle servers - and show it outperforms schemes that separate learning and placement. Our results also generalize to the setting where the demand statistics change with time. Overall, our results demonstrate that separating the estimation of demand, and the subsequent use of the same, is strictly suboptimal.

preprint2013arXiv

Distinguishing Infections on Different Graph Topologies

The history of infections and epidemics holds famous examples where understanding, containing and ultimately treating an outbreak began with understanding its mode of spread. Influenza, HIV and most computer viruses, spread person to person, device to device, through contact networks; Cholera, Cancer, and seasonal allergies, on the other hand, do not. In this paper we study two fundamental questions of detection: first, given a snapshot view of a (perhaps vanishingly small) fraction of those infected, under what conditions is an epidemic spreading via contact (e.g., Influenza), distinguishable from a "random illness" operating independently of any contact network (e.g., seasonal allergies); second, if we do have an epidemic, under what conditions is it possible to determine which network of interactions is the main cause of the spread -- the causative network -- without any knowledge of the epidemic, other than the identity of a minuscule subsample of infected nodes? The core, therefore, of this paper, is to obtain an understanding of the diagnostic power of network information. We derive sufficient conditions networks must satisfy for these problems to be identifiable, and produce efficient, highly scalable algorithms that solve these problems. We show that the identifiability condition we give is fairly mild, and in particular, is satisfied by two common graph topologies: the grid, and the Erdos-Renyi graphs.

preprint2013arXiv

Epidemic Thresholds with External Agents

We study the effect of external infection sources on phase transitions in epidemic processes. In particular, we consider an epidemic spreading on a network via the SIS/SIR dynamics, which in addition is aided by external agents - sources unconstrained by the graph, but possessing a limited infection rate or virulence. Such a model captures many existing models of externally aided epidemics, and finds use in many settings - epidemiology, marketing and advertising, network robustness, etc. We provide a detailed characterization of the impact of external agents on epidemic thresholds. In particular, for the SIS model, we show that any external infection strategy with constant virulence either fails to significantly affect the lifetime of an epidemic, or at best, sustains the epidemic for a lifetime which is polynomial in the number of nodes. On the other hand, a random external-infection strategy, with rate increasing linearly in the number of infected nodes, succeeds under some conditions to sustain an exponential epidemic lifetime. We obtain similar sharp thresholds for the SIR model, and discuss the relevance of our results in a variety of settings.

preprint2013arXiv

On the Role of Mobility for Multi-message Gossip

We consider information dissemination in a large $n$-user wireless network in which $k$ users wish to share a unique message with all other users. Each of the $n$ users only has knowledge of its own contents and state information; this corresponds to a one-sided push-only scenario. The goal is to disseminate all messages efficiently, hopefully achieving an order-optimal spreading rate over unicast wireless random networks. First, we show that a random-push strategy -- where a user sends its own or a received packet at random -- is order-wise suboptimal in a random geometric graph: specifically, $Ω(\sqrt{n})$ times slower than optimal spreading. It is known that this gap can be closed if each user has "full" mobility, since this effectively creates a complete graph. We instead consider velocity-constrained mobility where at each time slot the user moves locally using a discrete random walk with velocity $v(n)$ that is much lower than full mobility. We propose a simple two-stage dissemination strategy that alternates between individual message flooding ("self promotion") and random gossiping. We prove that this scheme achieves a close to optimal spreading rate (within only a logarithmic gap) as long as the velocity is at least $v(n)=ω(\sqrt{\log n/k})$. The key insight is that the mixing property introduced by the partial mobility helps users to spread in space within a relatively short period compared to the optimal spreading time, which macroscopically mimics message dissemination over a complete graph.

preprint2012arXiv

Greedy Learning of Markov Network Structure

We propose a new yet natural algorithm for learning the graph structure of general discrete graphical models (a.k.a. Markov random fields) from samples. Our algorithm finds the neighborhood of a node by sequentially adding nodes that produce the largest reduction in empirical conditional entropy; it is greedy in the sense that the choice of addition is based only on the reduction achieved at that iteration. Its sequential nature gives it a lower computational complexity as compared to other existing comparison-based techniques, all of which involve exhaustive searches over every node set of a certain size. Our main result characterizes the sample complexity of this procedure, as a function of node degrees, graph size and girth in factor-graph representation. We subsequently specialize this result to the case of Ising models, where we provide a simple transparent characterization of sample complexity as a function of model and graph parameters. For tree graphs, our algorithm is the same as the classical Chow-Liu algorithm, and in that sense can be considered the extension of the same to graphs with cycles.

preprint2012arXiv

On the Effect of Channel Fading on Greedy Scheduling

Greedy Maximal Scheduling (GMS) is an attractive low-complexity scheme for scheduling in wireless networks. Recent work has characterized its throughput for the case when there is no fading/channel variations. This paper aims to understand the effect of channel variations on the relative throughput performance of GMS vis-a-vis that of an optimal scheduler facing the same fading. The effect is not a-priori obvious because, on the one hand, fading could help by decoupling/precluding global states that lead to poor GMS performance, while on the other hand fading adds another degree of freedom in which an event unfavourable to GMS could occur. We show that both these situations can occur when fading is adversarial. In particular, we first define the notion of a {\em Fading Local Pooling factor (F-LPF)}, and show that it exactly characterizes the throughput of GMS in this setting. We also derive general upper and lower bounds on F-LPF. Using these bounds, we provide two example networks - one where the relative performance of GMS is worse than if there were no fading, and one where it is better.

preprint2012arXiv

Towards a Queueing-Based Framework for In-Network Function Computation

We seek to develop network algorithms for function computation in sensor networks. Specifically, we want dynamic joint aggregation, routing, and scheduling algorithms that have analytically provable performance benefits due to in-network computation as compared to simple data forwarding. To this end, we define a class of functions, the Fully-Multiplexible functions, which includes several functions such as parity, MAX, and k th -order statistics. For such functions we exactly characterize the maximum achievable refresh rate of the network in terms of an underlying graph primitive, the min-mincut. In acyclic wireline networks, we show that the maximum refresh rate is achievable by a simple algorithm that is dynamic, distributed, and only dependent on local information. In the case of wireless networks, we provide a MaxWeight-like algorithm with dynamic flow splitting, which is shown to be throughput-optimal.

preprint2011arXiv

On Sharing Viral Video over an Ad Hoc Wireless Network

We consider the problem of broadcasting a viral video (a large file) over an ad hoc wireless network (e.g., students in a campus). Many smartphones are GPS enabled, and equipped with peer-to-peer (ad hoc) transmission mode, allowing them to wirelessly exchange files over short distances rather than use the carrier's WAN. The demand for the file however is transmitted through the social network (e.g., a YouTube link posted on Facebook). To address this coupled-network problem (demand on the social network; bandwidth on the wireless network) where the two networks have different topologies, we propose a file dissemination algorithm. In our scheme, users query their social network to find geographically nearby friends that have the desired file, and utilize the underlying ad hoc network to route the data via multi-hop transmissions. We show that for many popular models for social networks, the file dissemination time scales sublinearly with n; the number of users, compared to the linear scaling required if each user who wants the file must download it from the carrier's WAN.

preprint2007arXiv

Rethinking Information Theory for Mobile Ad Hoc Networks

The subject of this paper is the long-standing open problem of developing a general capacity theory for wireless networks, particularly a theory capable of describing the fundamental performance limits of mobile ad hoc networks (MANETs). A MANET is a peer-to-peer network with no pre-existing infrastructure. MANETs are the most general wireless networks, with single-hop, relay, interference, mesh, and star networks comprising special cases. The lack of a MANET capacity theory has stunted the development and commercialization of many types of wireless networks, including emergency, military, sensor, and community mesh networks. Information theory, which has been vital for links and centralized networks, has not been successfully applied to decentralized wireless networks. Even if this was accomplished, for such a theory to truly characterize the limits of deployed MANETs it must overcome three key roadblocks. First, most current capacity results rely on the allowance of unbounded delay and reliability. Second, spatial and timescale decompositions have not yet been developed for optimally modeling the spatial and temporal dynamics of wireless networks. Third, a useful network capacity theory must integrate rather than ignore the important role of overhead messaging and feedback. This paper describes some of the shifts in thinking that may be needed to overcome these roadblocks and develop a more general theory that we refer to as non-equilibrium information theory.

Sanjay Shakkottai

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability

Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task?

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

How Does the Task Landscape Affect MAML Performance?

Linear Bandit Algorithms with Sublinear Time Complexity

Multi-Agent Low-Dimensional Linear Bandits

PAC Generalization via Invariant Representations

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

Contextual Bandits with Stochastic Experts

Improved Algorithms for Misspecified Linear Markov Decision Processes

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Robust Multi-Agent Multi-Armed Bandits

Stochastic Linear Bandits with Protected Subspace

Contextual Blocking Bandits

Importance Weighted Generative Networks

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Task-Robust Model-Agnostic Meta-Learning

Contextual Bandits with Latent Confounders: An NMF Approach

Detecting Sponsored Recommendations

Scheduling Storms and Streams in the Cloud

Wireless Scheduling with Partial Channel State Information: Large Deviations and Optimality

Epidemic Spreading with External Agents

Localized epidemic detection in networks with overwhelming noise

Online Collaborative-Filtering on Graphs

Serving Content with Unknown Demand:the High-Dimensional Regime

Distinguishing Infections on Different Graph Topologies

Epidemic Thresholds with External Agents

On the Role of Mobility for Multi-message Gossip

Greedy Learning of Markov Network Structure

On the Effect of Channel Fading on Greedy Scheduling

Towards a Queueing-Based Framework for In-Network Function Computation

On Sharing Viral Video over an Ad Hoc Wireless Network

Rethinking Information Theory for Mobile Ad Hoc Networks