Source author record

Devavrat Shah

Devavrat Shah appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

60works

26topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Causal Inference with Categorical Unobserved Confounder via Mixture Learning

Unobserved confounding is a fundamental challenge for estimating causal effects. To address unobserved confounding, recent literature has turned to two different approaches -- proxy variables and the use of multiple treatments. The first approach, commonly referred to as proximal causal inference, requires proxies to be assigned to specific asymmetric roles: treatment-inducing proxies (negative control exposures), variables that act as common causes of the treatment and outcome, and outcome-inducing proxies (negative control outcomes). In practice, however, identifying variables that satisfy these asymmetric roles can be difficult depending on the application domain. The second approach, commonly referred to as the ``Deconfounder," deals with multiple conditionally independent treatments. There has been limited progress towards developing a consistent estimation method for this setting. As the primary contribution of this work, we establish that causal effects are identifiable in both settings when the unobserved confounder is categorical under suitable conditions. Our approach builds on a mixture learning perspective: we show that the underlying confounding structure can be recovered by identifying the corresponding mixture distribution. We propose an estimation procedure based on tensor decomposition, which allows consistent recovery of the latent structure and comes with non-asymptotic guarantees. Simulation studies and real data experiments demonstrate that the proposed method performs well even with limited data.

preprint2026arXiv

OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries

Retrieval benchmarks are increasingly saturating, but we argue that efficient search is far from a solved problem. We identify a class of queries we call oblique, which seek documents that instantiate a latent pattern, like finding all tweets that express an implicit stance, chat logs that demonstrate a particular failure mode, or transcripts that match an abstract scenario. We study three mechanisms through which obliqueness may arise and introduce OBLIQ-Bench, a suite of five oblique search problems over real long-tail corpora. OBLIQ-Bench exposes an overlooked asymmetry between retrieval and verification, where reasoning LLMs reliably recognize latent relevance whenever relevant documents are surfaced, but even sophisticated retrieval pipelines fail to surface most relevant documents in the first place. We hope that OBLIQ-Bench will drive research into retrieval architectures that efficiently capture latent patterns and implicit signals in large corpora.

preprint2024arXiv

Federated Optimization of Smooth Loss Functions

In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $ε$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $η$-Hölder smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $ϕm(p/ε)^{Θ(d/η)}$ and that of FedAve scales like $ϕm(p/ε)^{3/4}$ (neglecting sub-dominant factors), where $ϕ\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.

preprint2023arXiv

Robust Max Entrywise Error Bounds for Tensor Estimation from Sparse Observations via Similarity Based Collaborative Filtering

Consider the task of estimating a 3-order $n \times n \times n$ tensor from noisy observations of randomly chosen entries in the sparse regime. We introduce a similarity based collaborative filtering algorithm for estimating a tensor from sparse observations and argue that it achieves sample complexity that nearly matches the conjectured computationally efficient lower bound on the sample complexity for the setting of low-rank tensors. Our algorithm uses the matrix obtained from the flattened tensor to compute similarity, and estimates the tensor entries using a nearest neighbor estimator. We prove that the algorithm recovers a finite rank tensor with maximum entry-wise error (MEE) and mean-squared-error (MSE) decaying to $0$ as long as each entry is observed independently with probability $p = Ω(n^{-3/2 + κ})$ for any arbitrarily small $κ> 0$. More generally, we establish robustness of the estimator, showing that when arbitrary noise bounded by $\varepsilon \geq 0$ is added to each observation, the estimation error with respect to MEE and MSE degrades by $\text{poly}(\varepsilon)$. Consequently, even if the tensor may not have finite rank but can be approximated within $\varepsilon \geq 0$ by a finite rank tensor, then the estimation error converges to $\text{poly}(\varepsilon)$. Our analysis sheds insight into the conjectured sample complexity lower bound, showing that it matches the connectivity threshold of the graph used by our algorithm for estimating similarity between coordinates.

preprint2022arXiv

Current Implicit Policies May Not Eradicate COVID-19

Successful predictive modeling of epidemics requires an understanding of the implicit feedback control strategies which are implemented by populations to modulate the spread of contagion. While this task of capturing endogenous behavior can be achieved through intricate modeling assumptions, we find that a population's reaction to case counts can be described through a second order affine dynamical system with linear control which fits well to the data across different regions and times throughout the COVID-19 pandemic. The model fits the data well both in and out of sample across the 50 states of the United States, with comparable $R^2$ scores to state of the art ensemble predictions. In contrast to recent models of epidemics, rather than assuming that individuals directly control the contact rate which governs the spread of disease, we assume that individuals control the rate at which they vary their number of interactions, i.e. they control the derivative of the contact rate. We propose an implicit feedback law for this control input and verify that it correlates with policies taken throughout the pandemic. A key takeaway of the dynamical model is that the "stable" point of case counts is non-zero, i.e. COVID-19 will not be eradicated under the current collection of policies and strategies, and additional policies are needed to fully eradicate it quickly. Hence, we suggest alternative implicit policies which focus on making interventions (such as vaccinations and mobility restrictions) a function of cumulative case counts, for which our results suggest a better possibility of eradicating COVID-19.

preprint2022arXiv

Gradient Descent for Low-Rank Functions

Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed \emph{Low-Rank Gradient Descent} (LRGD) algorithm finds an $ε$-approximate stationary point of a $p$-dimensional function by first identifying $r \leq p$ significant directions, and then estimating the true $p$-dimensional gradient at every iteration by computing directional derivatives only along those $r$ directions. We establish that the "directional oracle complexities" of LRGD for strongly convex and non-convex objective functions are $\mathcal{O}(r \log(1/ε) + rp)$ and $\mathcal{O}(r/ε^2 + rp)$, respectively. When $r \ll p$, these complexities are smaller than the known complexities of $\mathcal{O}(p \log(1/ε))$ and $\mathcal{O}(p/ε^2)$ of {\gd} in the strongly convex and non-convex settings, respectively. Thus, LRGD significantly reduces the computational cost of gradient-based methods for sufficiently low-rank functions. In the course of our analysis, we also formally define and characterize the classes of exact and approximately low-rank functions.

preprint2022arXiv

On Multivariate Singular Spectrum Analysis and its Variants

We introduce and analyze a variant of multivariate singular spectrum analysis (mSSA), a popular time series method to impute and forecast a multivariate time series. Under a spatio-temporal factor model we introduce, given $N$ time series and $T$ observations per time series, we establish prediction mean-squared-error for both imputation and out-of-sample forecasting effectively scale as $1 / \sqrt{\min(N, T )T}$. This is an improvement over: (i) $1 /\sqrt{T}$ error scaling of SSA, the restriction of mSSA to a univariate time series; (ii) $1/\min(N, T)$ error scaling for matrix estimation methods which do not exploit temporal structure in the data. The spatio-temporal model we introduce includes any finite sum and products of: harmonics, polynomials, differentiable periodic functions, and Holder continuous functions. Our out-of-sample forecasting result could be of independent interest for online learning under a spatio-temporal factor model. Empirically, on benchmark datasets, our variant of mSSA performs competitively with state-of-the-art neural-network time series methods (e.g. DeepAR, LSTM) and significantly outperforms classical methods such as vector autoregression (VAR). Finally, we propose extensions of mSSA: (i) a variant to estimate time-varying variance of a time series; (ii) a tensor variant which has better sample complexity for certain regimes of $N$ and $T$.

preprint2022arXiv

Regret, stability & fairness in matching markets with bandit learners

Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.

preprint2022arXiv

Unifying Epidemic Models with Mixtures

The COVID-19 pandemic has emphasized the need for a robust understanding of epidemic models. Current models of epidemics are classified as either mechanistic or non-mechanistic: mechanistic models make explicit assumptions on the dynamics of disease, whereas non-mechanistic models make assumptions on the form of observed time series. Here, we introduce a simple mixture-based model which bridges the two approaches while retaining benefits of both. The model represents time series of cases and fatalities as a mixture of Gaussian curves, providing a flexible function class to learn from data compared to traditional mechanistic models. Although the model is non-mechanistic, we show that it arises as the natural outcome of a stochastic process based on a networked SIR framework. This allows learned parameters to take on a more meaningful interpretation compared to similar non-mechanistic models, and we validate the interpretations using auxiliary mobility data collected during the COVID-19 pandemic. We provide a simple learning algorithm to identify model parameters and establish theoretical results which show the model can be efficiently learned from data. Empirically, we find the model to have low prediction error. The model is available live at covidpredictions.mit.edu. Ultimately, this allows us to systematically understand the impacts of interventions on COVID-19, which is critical in developing data-driven solutions to controlling epidemics.

preprint2021arXiv

Time varying regression with hidden linear dynamics

We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system. Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates. We offer a finite sample guarantee on the estimation error of our method and discuss certain advantages it has over Expectation-Maximization (EM), which is the main approach proposed by prior work.

preprint2021arXiv

tspDB: Time Series Predict DB

A major bottleneck of the current Machine Learning (ML) workflow is the time consuming, error prone engineering required to get data from a datastore or a database (DB) to the point an ML algorithm can be applied to it. Hence, we explore the feasibility of directly integrating prediction functionality on top of a data store or DB. Such a system ideally: (i) provides an intuitive prediction query interface which alleviates the unwieldy data engineering; (ii) provides state-of-the-art statistical accuracy while ensuring incremental model update, low model training time and low latency for making predictions. As the main contribution we explicitly instantiate a proof-of-concept, tspDB, which directly integrates with PostgreSQL. We rigorously test tspDB's statistical and computational performance against the state-of-the-art time series algorithms, including a Long-Short-Term-Memory (LSTM) neural network and DeepAR (industry standard deep learning library by Amazon). Statistically, on standard time series benchmarks, tspDB outperforms LSTM and DeepAR with 1.1-1.3x higher relative accuracy. Computationally, tspDB is 59-62x and 94-95x faster compared to LSTM and DeepAR in terms of median ML model training time and prediction query latency, respectively. Further, compared to PostgreSQL's bulk insert time and its SELECT query latency, tspDB is slower only by 1.3x and 2.6x respectively. That is, tspDB is a real-time prediction system in that its model training / prediction query time is similar to just inserting / reading data from a DB. As an algorithmic contribution, we introduce an incremental multivariate matrix factorization based time series method, which tspDB is built off. We show this method also allows one to produce reliable prediction intervals by accurately estimating the time-varying variance of a time series, thereby addressing an important problem in time series analysis.

preprint2020arXiv

Deconvolution with Unknown Error Distribution Interpreted as Blind Isotonic Regression

Deconvolution is a statistical inverse problem to estimate the distribution of a random variable based on its noisy observations. Despite the extensive studies on the topic, deconvolution with unknown noise distribution remains as a notoriously hard problem. We propose a matrix-based viewpoint for collective deconvolution that subsumes the setup with repeated measurements as a special case. As the main result, we describe a simple algorithm that partially utilizes matrix structure to solve deconvolution problem and provide non-asymptotic error analysis for the algorithm. We show that the proposed algorithm achieves the minimax optimal rate for deconvolution in a restricted sense. We also remark the connection between the collective deconvolution and the so-called statistical seriation as a byproduct or our matrix viewpoint. We conjecture that the link suggests that collective deconvolution, as well as deconvolution with repeated measurements, is intrinsically much easier than usual deconvolution of a single distribution.

preprint2020arXiv

Estimation of Skill Distributions

In this paper, we study the problem of learning the skill distribution of a population of agents from observations of pairwise games in a tournament. These games are played among randomly drawn agents from the population. The agents in our model can be individuals, sports teams, or Wall Street fund managers. Formally, we postulate that the likelihoods of game outcomes are governed by the Bradley-Terry-Luce (or multinomial logit) model, where the probability of an agent beating another is the ratio between its skill level and the pairwise sum of skill levels, and the skill parameters are drawn from an unknown skill density of interest. The problem is, in essence, to learn a distribution from noisy, quantized observations. We propose a simple and tractable algorithm that learns the skill density with near-optimal minimax mean squared error scaling as $n^{-1+\varepsilon}$, for any $\varepsilon>0$, when the density is smooth. Our approach brings together prior work on learning skill parameters from pairwise comparisons with kernel density estimation from non-parametric statistics. Furthermore, we prove minimax lower bounds which establish minimax optimality of the skill parameter estimation technique used in our algorithm. These bounds utilize a continuum version of Fano's method along with a covering argument. We apply our algorithm to various soccer leagues and world cups, cricket world cups, and mutual funds. We find that the entropy of a learnt distribution provides a quantitative measure of skill, which provides rigorous explanations for popular beliefs about perceived qualities of sporting events, e.g., soccer league rankings. Finally, we apply our method to assess the skill distributions of mutual funds. Our results shed light on the abundance of low quality funds prior to the Great Recession of 2008, and the domination of the industry by more skilled funds after the financial crisis.

preprint2020arXiv

Learning RUMs: Reducing Mixture to Single Component via PCA

We consider the problem of learning a mixture of Random Utility Models (RUMs). Despite the success of RUMs in various domains and the versatility of mixture RUMs to capture the heterogeneity in preferences, there has been only limited progress in learning a mixture of RUMs from partial data such as pairwise comparisons. In contrast, there have been significant advances in terms of learning a single component RUM using pairwise comparisons. In this paper, we aim to bridge this gap between mixture learning and single component learning of RUM by developing a `reduction' procedure. We propose to utilize PCA-based spectral clustering that simultaneously `de-noises' pairwise comparison data. We prove that our algorithm manages to cluster the partial data correctly (i.e., comparisons from the same RUM component are grouped in the same cluster) with high probability even when data is generated from a possibly {\em heterogeneous} mixture of well-separated {\em generic} RUMs. Both the time and the sample complexities scale polynomially in model parameters including the number of items. Two key features in the analysis are in establishing (1) a meaningful upper bound on the sub-Gaussian norm for RUM components embedded into the vector space of pairwise marginals and (2) the robustness of PCA with missing values in the $L_{2, \infty}$ sense, which might be of interest in their own right.

preprint2020arXiv

Non-Asymptotic Analysis of Monte Carlo Tree Search

In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of infinite-horizon discounted cost Markov Decision Process (MDP). While MCTS is believed to provide an approximate value function for a given state with enough simulations, the claimed proof in the seminal works is incomplete. This is due to the fact that the variant, the Upper Confidence Bound for Trees (UCT), analyzed in prior works utilizes "logarithmic" bonus term for balancing exploration and exploitation within the tree-based search, following the insights from stochastic multi-arm bandit (MAB) literature. In effect, such an approach assumes that the regret of the underlying recursively dependent non-stationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold as pointed out in literature, even for stationary MABs. As the key contribution of this work, we establish polynomial concentration property of regret for a class of non-stationary MAB. This in turn establishes that the MCTS with appropriate polynomial rather than logarithmic bonus term in UCB has the claimed property. Using this as a building block, we argue that MCTS, combined with nearest neighbor supervised learning, acts as a "policy improvement" operator: it iteratively improves value function approximation for all states, due to combining with supervised learning, despite evaluating at only finitely many states. In effect, we establish that to learn an $\varepsilon$ approximation of the value function with respect to $\ell_\infty$ norm, MCTS combined with nearest neighbor requires a sample size scaling as $\widetilde{O}\big(\varepsilon^{-(d+4)}\big)$, where $d$ is the dimension of the state space. This is nearly optimal due to a minimax lower bound of $\widetildeΩ\big(\varepsilon^{-(d+2)}\big)$.

preprint2020arXiv

On Reinforcement Learning for Turn-based Zero-sum Markov Games

We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games. Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach. Specifically, we propose Explore-Improve-Supervise (EIS) method that combines "exploration", "policy improvement"' and "supervised learning" to find the value function and policy associated with Nash equilibrium. We identify sufficient conditions for convergence and correctness for such an approach. For a concrete instance of EIS where random policy is used for "exploration", Monte-Carlo Tree Search is used for "policy improvement" and Nearest Neighbors is used for "supervised learning", we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in $\widetilde{O}(\varepsilon^{-(d+4)})$ steps when the underlying state-space of the game is continuous and $d$-dimensional. This is nearly optimal as we establish a lower bound of $\widetildeΩ(\varepsilon^{-(d+2)})$ for any policy.

preprint2020arXiv

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If $Q$-function is Lipschitz continuous, then the minimal sample complexity for estimating $ε$-optimal $Q$-function is known to scale as $Ω(\frac{1}{ε^{d_1+d_2 +2}})$ per classical non-parametric learning theory, where $d_1$ and $d_2$ denote the dimensions of the state and action spaces respectively. The $Q$-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of $Q$-functions parameterized by its "rank" $r$, which contains all Lipschitz $Q$-functions as $r \to \infty$. As our key contribution, we develop a simple, iterative learning algorithm that finds $ε$-optimal $Q$-function with sample complexity of $\widetilde{O}(\frac{1}{ε^{\max(d_1, d_2)+2}})$ when the optimal $Q$-function has low rank $r$ and the discounting factor $γ$ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the $\ell_\infty$ sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.

preprint2020arXiv

Stable Reinforcement Learning with Unbounded State Space

We consider the problem of reinforcement learning (RL) with unbounded state space motivated by the classical problem of scheduling in a queueing network. Traditional policies as well as error metric that are designed for finite, bounded or compact state space, require infinite samples for providing any meaningful performance guarantee (e.g. $\ell_\infty$ error) for unbounded state space. That is, we need a new notion of performance metric. As the main contribution of this work, inspired by the literature in queuing systems and control theory, we propose stability as the notion of "goodness": the state dynamics under the policy should remain in a bounded region with high probability. As a proof of concept, we propose an RL policy using Sparse-Sampling-based Monte Carlo Oracle and argue that it satisfies the stability property as long as the system dynamics under the optimal policy respects a Lyapunov function. The assumption of existence of a Lyapunov function is not restrictive as it is equivalent to the positive recurrence or stability property of any Markov chain, i.e., if there is any policy that can stabilize the system then it must possess a Lyapunov function. And, our policy does not utilize the knowledge of the specific Lyapunov function. To make our method sample efficient, we provide an improved, sample efficient Sparse-Sampling-based Monte Carlo Oracle with Lipschitz value function that may be of interest in its own right. Furthermore, we design an adaptive version of the algorithm, based on carefully constructed statistical tests, which finds the correct tuning parameter automatically.

preprint2020arXiv

Two Burning Questions on COVID-19: Did shutting down the economy help? Can we (partially) reopen the economy without risking the second wave?

As we reach the apex of the COVID-19 pandemic, the most pressing question facing us is: can we even partially reopen the economy without risking a second wave? We first need to understand if shutting down the economy helped. And if it did, is it possible to achieve similar gains in the war against the pandemic while partially opening up the economy? To do so, it is critical to understand the effects of the various interventions that can be put into place and their corresponding health and economic implications. Since many interventions exist, the key challenge facing policy makers is understanding the potential trade-offs between them, and choosing the particular set of interventions that works best for their circumstance. In this memo, we provide an overview of Synthetic Interventions (a natural generalization of Synthetic Control), a data-driven and statistically principled method to perform what-if scenario planning, i.e., for policy makers to understand the trade-offs between different interventions before having to actually enact them. In essence, the method leverages information from different interventions that have already been enacted across the world and fits it to a policy maker's setting of interest, e.g., to estimate the effect of mobility-restricting interventions on the U.S., we use daily death data from countries that enforced severe mobility restrictions to create a "synthetic low mobility U.S." and predict the counterfactual trajectory of the U.S. if it had indeed applied a similar intervention. Using Synthetic Interventions, we find that lifting severe mobility restrictions and only retaining moderate mobility restrictions (at retail and transit locations), seems to effectively flatten the curve. We hope this provides guidance on weighing the trade-offs between the safety of the population, strain on the healthcare system, and impact on the economy.

preprint2016arXiv

Regret Guarantees for Item-Item Collaborative Filtering

There is much empirical evidence that item-item collaborative filtering works well in practice. Motivated to understand this, we provide a framework to design and analyze various recommendation algorithms. The setup amounts to online binary matrix completion, where at each time a random user requests a recommendation and the algorithm chooses an entry to reveal in the user's row. The goal is to minimize regret, or equivalently to maximize the number of +1 entries revealed at any time. We analyze an item-item collaborative filtering algorithm that can achieve fundamentally better performance compared to user-user collaborative filtering. The algorithm achieves good "cold-start" performance (appropriately defined) by quickly making good recommendations to new users about whom there is little information.

preprint2015arXiv

A Latent Source Model for Patch-Based Image Segmentation

Despite the popularity and empirical success of patch-based nearest-neighbor and weighted majority voting approaches to medical image segmentation, there has been no theoretical development on when, why, and how well these nonparametric methods work. We bridge this gap by providing a theoretical performance guarantee for nearest-neighbor and weighted majority voting segmentation under a new probabilistic model for patch-based image segmentation. Our analysis relies on a new local property for how similar nearby patches are, and fuses existing lines of work on modeling natural imagery patches and theory for nonparametric classification. We use the model to derive a new patch-based segmentation algorithm that iterates between inferring local label patches and merging these local segmentations to produce a globally consistent image segmentation. Many existing patch-based algorithms arise as special cases of the new algorithm.

preprint2015arXiv

Approximating the Stationary Probability of a Single State in a Markov chain

In this paper, we present a novel iterative Monte Carlo method for approximating the stationary probability of a single state of a positive recurrent Markov chain. We utilize the characterization that the stationary probability of a state $i$ is inversely proportional to the expected return time of a random walk beginning at $i$. Our method obtains an $ε$-multiplicative close estimate with probability greater than $1 - α$ using at most $\tilde{O}\left(t_{\text{mix}} \ln(1/α) / π_i ε^2 \right)$ simulated random walk steps on the Markov chain across all iterations, where $t_{\text{mix}}$ is the standard mixing time and $π_i$ is the stationary probability. In addition, the estimate at each iteration is guaranteed to be an upper bound with high probability, and is decreasing in expectation with the iteration count, allowing us to monitor the progress of the algorithm and design effective termination criteria. We propose a termination criteria which guarantees a $ε(1 + 4 \ln(2) t_{\text{mix}})$ multiplicative error performance for states with stationary probability larger than $Δ$, while providing an additive error for states with stationary probability less than $Δ\in (0,1)$. The algorithm along with this termination criteria uses at most $\tilde{O}\left(\frac{\ln(1/α)}{ε^2} \min\left(\frac{t_{\text{mix}}}{π_i}, \frac{1}{εΔ}\right)\right)$ simulated random walk steps, which is bounded by a constant with respect to the Markov Chain. We provide a tight analysis of our algorithm based on a locally weighted variant of the mixing time. Our results naturally extend for countably infinite state space Markov chains via Lyapunov function analysis.

preprint2015arXiv

Finding Rumor Sources on Random Trees

We consider the problem of detecting the source of a rumor which has spread in a network using only observations about which set of nodes are infected with the rumor and with no information as to \emph{when} these nodes became infected. In a recent work \citep{ref:rc} this rumor source detection problem was introduced and studied. The authors proposed the graph score function {\em rumor centrality} as an estimator for detecting the source. They establish it to be the maximum likelihood estimator with respect to the popular Susceptible Infected (SI) model with exponential spreading times for regular trees. They showed that as the size of the infected graph increases, for a path graph (2-regular tree), the probability of source detection goes to $0$ while for $d$-regular trees with $d \geq 3$ the probability of detection, say $α_d$, remains bounded away from $0$ and is less than $1/2$. However, their results stop short of providing insights for the performance of the rumor centrality estimator in more general settings such as irregular trees or the SI model with non-exponential spreading times. This paper overcomes this limitation and establishes the effectiveness of rumor centrality for source detection for generic random trees and the SI model with a generic spreading time distribution. The key result is an interesting connection between a continuous time branching process and the effectiveness of rumor centrality. Through this, it is possible to quantify the detection probability precisely. As a consequence, we recover all previous results as a special case and obtain a variety of novel results including the {\em universality} of rumor centrality in the context of tree-like graphs and the SI model with a generic spreading time distribution.

preprint2015arXiv

Rank Centrality: Ranking from Pair-wise Comparisons

The question of aggregating pair-wise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR's TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining a ranking, finding `scores' for each object (e.g. player's rating) is of interest for understanding the intensity of the preferences. In this paper, we propose Rank Centrality, an iterative rank aggregation algorithm for discovering scores for objects (or items) from pair-wise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with an edge present between a pair of objects if they are compared; the score, which we call Rank Centrality, of an object turns out to be its stationary probability under this random walk. To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (equivalent to the Multinomial Logit (MNL) for pair-wise comparisons) in which each object has an associated score which determines the probabilistic outcomes of pair-wise comparisons between objects. In terms of the pair-wise marginal probabilities, which is the main subject of this paper, the MNL model and the BTL model are identical. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. In particular, the number of samples required to learn the score well with high probability depends on the structure of the comparison graph. When the Laplacian of the comparison graph has a strictly positive spectral gap, e.g. each item is compared to a subset of randomly chosen items, this leads to dependence on the number of samples that is nearly order-optimal.

preprint2014arXiv

A Latent Source Model for Online Collaborative Filtering

Despite the prevalence of collaborative filtering in recommendation systems, there has been little theoretical development on why and how well it works, especially in the "online" setting, where items are recommended to users over time. We address this theoretical gap by introducing a model for online recommendation systems, cast item recommendation under the model as a learning problem, and analyze the performance of a cosine-similarity collaborative filtering method. In our model, each of $n$ users either likes or dislikes each of $m$ items. We assume there to be $k$ types of users, and all the users of a given type share a common string of probabilities determining the chance of liking each item. At each time step, we recommend an item to each user, where a key distinction from related bandit literature is that once a user consumes an item (e.g., watches a movie), then that item cannot be recommended to the same user again. The goal is to maximize the number of likable items recommended to users over time. Our main result establishes that after nearly $\log(km)$ initial learning time steps, a simple collaborative filtering algorithm achieves essentially optimal performance without knowing $k$. The algorithm has an exploitation step that uses cosine similarity and two types of exploration steps, one to explore the space of items (standard in the literature) and the other to explore similarity between users (novel to this work).

preprint2014arXiv

Bayesian regression and Bitcoin

In this paper, we discuss the method of Bayesian regression and its efficacy for predicting price variation of Bitcoin, a recently popularized virtual, cryptographic currency. Bayesian regression refers to utilizing empirical data as proxy to perform Bayesian inference. We utilize Bayesian regression for the so-called "latent source model". The Bayesian regression for "latent source model" was introduced and discussed by Chen, Nikolov and Shah (2013) and Bresler, Chen and Shah (2014) for the purpose of binary classification. They established theoretical as well as empirical efficacy of the method for the setting of binary classification. In this paper, instead we utilize it for predicting real-valued quantity, the price of Bitcoin. Based on this price prediction method, we devise a simple strategy for trading Bitcoin. The strategy is able to nearly double the investment in less than 60 day period when run against real data trace.

preprint2014arXiv

Hardness of parameter estimation in graphical models

We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).

preprint2014arXiv

Learning graphical models from the Glauber dynamics

In this paper we consider the problem of learning undirected graphical models from data generated according to the Glauber dynamics. The Glauber dynamics is a Markov chain that sequentially updates individual nodes (variables) in a graphical model and it is frequently used to sample from the stationary distribution (to which it converges given sufficient time). Additionally, the Glauber dynamics is a natural dynamical model in a variety of settings. This work deviates from the standard formulation of graphical model learning in the literature, where one assumes access to i.i.d. samples from the distribution. Much of the research on graphical model learning has been directed towards finding algorithms with low computational cost. As the main result of this work, we establish that the problem of reconstructing binary pairwise graphical models is computationally tractable when we observe the Glauber dynamics. Specifically, we show that a binary pairwise graphical model on $p$ nodes with maximum degree $d$ can be learned in time $f(d)p^2\log p$, for a function $f(d)$, using nearly the information-theoretic minimum number of samples.

preprint2014arXiv

Learning Mixed Multinomial Logit Model from Ordinal Data

Motivated by generating personalized recommendations using ordinal (or preference) data, we study the question of learning a mixture of MultiNomial Logit (MNL) model, a parameterized class of distributions over permutations, from partial ordinal or preference data (e.g. pair-wise comparisons). Despite its long standing importance across disciplines including social choice, operations research and revenue management, little is known about this question. In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible. However, even learning mixture with two MNL components is infeasible in general. Given this state of affairs, we seek conditions under which it is feasible to learn the mixture model in both computationally and statistically efficient manner. We present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data. In particular, a mixture of $r$ MNL components over $n$ objects can be learnt using samples whose size scales polynomially in $n$ and $r$ (concretely, $r^{3.5}n^3(log n)^4$, with $r\ll n^{2/7}$ when the model parameters are sufficiently incoherent). The algorithm has two phases: first, learn the pair-wise marginals for each component using tensor decomposition; second, learn the model parameters for each component using Rank Centrality introduced by Negahban et al. In the process of proving these results, we obtain a generalization of existing analysis for tensor decomposition to a more realistic regime where only partial information about each sample is available.

preprint2014arXiv

On Queue-Size Scaling for Input-Queued Switches

We study the optimal scaling of the expected total queue size in an $n\times n$ input-queued switch, as a function of the number of ports $n$ and the load factor $ρ$, which has been conjectured to be $Θ(n/(1-ρ))$. In a recent work, the validity of this conjecture has been established for the regime where $1-ρ= O(1/n^2)$. In this paper, we make further progress in the direction of this conjecture. We provide a new class of scheduling policies under which the expected total queue size scales as $O(n^{1.5}(1-ρ)^{-1}\log(1/(1-ρ)))$ when $1-ρ= O(1/n)$. This is an improvement over the state of the art; for example, for $ρ= 1 - 1/n$ the best known bound was $O(n^3)$, while ours is $O(n^{2.5}\log n)$.

preprint2014arXiv

Statistical inference with probabilistic graphical models

These are notes from the lecture of Devavrat Shah given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013. The school was organized by Florent Krzakala from UPMC & ENS Paris, Federico Ricci-Tersenghi from La Sapienza Roma, Lenka Zdeborova from CEA Saclay & CNRS, and Riccardo Zecchina from Politecnico Torino. This lecture of Devavrat Shah (MIT) covers the basics of inference and learning. It explains how inference problems are represented within structures known as graphical models. The theoretical basis of the belief propagation algorithm is then explained and derived. This lecture sets the stage for generalizations and applications of message passing algorithms.

preprint2014arXiv

Structure learning of antiferromagnetic Ising models

In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of $Ω(p^{d/2})$ for learning general graphical models on $p$ nodes of maximum degree $d$, for the class of so-called statistical algorithms recently introduced by Feldman et al (2013). The lower bound suggests that the $O(p^d)$ runtime required to exhaustively search over neighborhoods cannot be significantly improved without restricting the class of models. Aside from structural assumptions on the graph such as it being a tree, hypertree, tree-like, etc., many recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari (2009) showed that all known low-complexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the opposite behavior: very strong interaction allows efficient learning in time $O(p^2)$. We provide an algorithm whose performance interpolates between $O(p^2)$ and $O(p^{d+2})$ depending on the strength of the repulsion.

preprint2013arXiv

A Latent Source Model for Nonparametric Time Series Classification

For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren't actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a "weighted majority voting" classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such "trending topics" in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%.

preprint2013arXiv

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Crowdsourcing systems, in which numerous tasks are electronically distributed to numerous "information piece-workers", have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all such systems must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in an appropriate manner, e.g. majority voting. In this paper, we consider a general model of such crowdsourcing tasks and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm, inspired by belief propagation and low-rank matrix approximation, significantly outperforms majority voting and, in fact, is optimal through comparison to an oracle that knows the reliability of every worker. Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks. By adaptively deciding which questions to ask to the next arriving worker, one might hope to reduce uncertainty more efficiently. We show that, perhaps surprisingly, the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and non-adaptive scenarios. Hence, our non-adaptive approach is order-optimal under both scenarios. This strongly relies on the fact that workers are fleeting and can not be exploited. Therefore, architecturally, our results suggest that building a reliable worker-reputation system is essential to fully harnessing the potential of adaptive designs.

preprint2013arXiv

Partition-Merge: Distributed Inference and Modularity Optimization

This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF), and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems essentially run in time linear in the number of nodes in the graph, and perform as well -- or even better -- than the original centralized algorithm as long as the graph has geometric structures. Here we say a graph has geometric structures, or polynomial growth property, when the number of nodes within distance r of any given node grows no faster than a polynomial function of r. More precisely, if the centralized algorithm is a C-factor approximation with constant C \ge 1, the resulting distributed algorithm is a (C+δ)-factor approximation for any small δ>0; but if the centralized algorithm is a non-constant (e.g. logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm.

preprint2012arXiv

Belief Propagation for Min-cost Network Flow: Convergence and Correctness

Message passing type algorithms such as the so-called Belief Propagation algorithm have recently gained a lot of attention in the statistics, signal processing and machine learning communities as attractive algorithms for solving a variety of optimization and inference problems. As a decentralized, easy to implement and empirically successful algorithm, BP deserves attention from the theoretical standpoint, and here not much is known at the present stage. In order to fill this gap we consider the performance of the BP algorithm in the context of the capacitated minimum-cost network flow problem - the classical problem in the operations research field. We prove that BP converges to the optimal solution in the pseudo-polynomial time, provided that the optimal solution of the underlying problem is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time randomized approximation scheme (FPRAS) for the same problem, which no longer requires the uniqueness of the optimal solution. This is the first instance where BP is proved to have fully-polynomial running time. Our results thus provide a theoretical justification for the viability of BP as an attractive method to solve an important class of optimization problems.

preprint2012arXiv

De-randomizing Shannon: The Design and Analysis of a Capacity-Achieving Rateless Code

This paper presents an analysis of spinal codes, a class of rateless codes proposed recently. We prove that spinal codes achieve Shannon capacity for the binary symmetric channel (BSC) and the additive white Gaussian noise (AWGN) channel with an efficient polynomial-time encoder and decoder. They are the first rateless codes with proofs of these properties for BSC and AWGN. The key idea in the spinal code is the sequential application of a hash function over the message bits. The sequential structure of the code turns out to be crucial for efficient decoding. Moreover, counter to the wisdom of having an expander structure in good codes, we show that the spinal code, despite its sequential structure, achieves capacity. The pseudo-randomness provided by a hash function suffices for this purpose. Our proof introduces a variant of Gallager's result characterizing the error exponent of random codes for any memoryless channel. We present a novel application of these error-exponent results within the framework of an efficient sequential code. The application of a hash function over the message bits provides a methodical and effective way to de-randomize Shannon's random codebook construction.

preprint2012arXiv

Switched networks with maximum weight policies: Fluid approximation and multiplicative state space collapse

We consider a queueing network in which there are constraints on which queues may be served simultaneously; such networks may be used to model input-queued switches and wireless networks. The scheduling policy for such a network specifies which queues to serve at any point in time. We consider a family of scheduling policies, related to the maximum-weight policy of Tassiulas and Ephremides [IEEE Trans. Automat. Control 37 (1992) 1936--1948], for single-hop and multihop networks. We specify a fluid model and show that fluid-scaled performance processes can be approximated by fluid model solutions. We study the behavior of fluid model solutions under critical load, and characterize invariant states as those states which solve a certain network-wide optimization problem. We use fluid model results to prove multiplicative state space collapse. A notable feature of our results is that they do not assume complete resource pooling.

preprint2011arXiv

A Nonparametric Approach to Modeling Choice with Limited Data

A central push in operations models over the last decade has been the incorporation of models of customer choice. Real world implementations of many of these models face the formidable stumbling block of simply identifying the `right' model of choice to use. Thus motivated, we visit the following problem: For a `generic' model of consumer choice (namely, distributions over preference lists) and a limited amount of data on how consumers actually make decisions (such as marginal information about these distributions), how may one predict revenues from offering a particular assortment of choices? We present a framework to answer such questions and design a number of tractable algorithms from a data and computational standpoint for the same. This paper thus takes a significant step towards `automating' the crucial task of choice model selection in the context of operational decision problems.

preprint2011arXiv

Assortment Optimization Under General Choice

We consider the problem of static assortment optimization, where the goal is to find the assortment of size at most $C$ that maximizes revenues. This is a fundamental decision problem in the area of Operations Management. It has been shown that this problem is provably hard for most of the important families of parametric of choice models, except the multinomial logit (MNL) model. In addition, most of the approximation schemes proposed in the literature are tailored to a specific parametric structure. We deviate from this and propose a general algorithm to find the optimal assortment assuming access to only a subroutine that gives revenue predictions; this means that the algorithm can be applied with any choice model. We prove that when the underlying choice model is the MNL model, our algorithm can find the optimal assortment efficiently.

preprint2011arXiv

Caching in Wireless Networks

We consider the problem of delivering content cached in a wireless network of n nodes randomly located on a square of area n. The network performance is described by the n2^n-dimensional caching capacity region of the wireless network. We provide an inner bound on this caching capacity region, and, in the high path-loss regime, a matching (in the scaling sense) outer bound. For large path-loss exponent, this provides an information-theoretic scaling characterization of the entire caching capacity region. The proposed communication scheme achieving the inner bound shows that the problems of cache selection and channel coding can be solved separately without loss of order-optimality. On the other hand, our results show that the common architecture of nearest-neighbor cache selection can be arbitrarily bad, implying that cache selection and load balancing need to be performed jointly.

preprint2011arXiv

Efficient Distributed Medium Access

Consider a wireless network of n nodes represented by a graph G=(V, E) where an edge (i,j) models the fact that transmissions of i and j interfere with each other, i.e. simultaneous transmissions of i and j become unsuccessful. Hence it is required that at each time instance a set of non-interfering nodes (corresponding to an independent set in G) access the wireless medium. To utilize wireless resources efficiently, it is required to arbitrate the access of medium among interfering nodes properly. Moreover, to be of practical use, such a mechanism is required to be totally distributed as well as simple. As the main result of this paper, we provide such a medium access algorithm. It is randomized, totally distributed and simple: each node attempts to access medium at each time with probability that is a function of its local information. We establish efficiency of the algorithm by showing that the corresponding network Markov chain is positive recurrent as long as the demand imposed on the network can be supported by the wireless network (using any algorithm). In that sense, the proposed algorithm is optimal in terms of utilizing wireless resources. The algorithm is oblivious to the network graph structure, in contrast with the so-called `polynomial back-off' algorithm by Hastad-Leighton-Rogoff (STOC '87, SICOMP '96) that is established to be optimal for the complete graph and bipartite graphs (by Goldberg-MacKenzie (SODA '96, JCSS '99)).

preprint2011arXiv

Inferring Rankings Using Constrained Sensing

We consider the problem of recovering a function over the space of permutations (or, the symmetric group) over $n$ elements from given partial information; the partial information we consider is related to the group theoretic Fourier Transform of the function. This problem naturally arises in several settings such as ranked elections, multi-object tracking, ranking systems, and recommendation systems. Inspired by the work of Donoho and Stark in the context of discrete-time functions, we focus on non-negative functions with a sparse support (support size $\ll$ domain size). Our recovery method is based on finding the sparsest solution (through $\ell_0$ optimization) that is consistent with the available information. As the main result, we derive sufficient conditions for functions that can be recovered exactly from partial information through $\ell_0$ optimization. Under a natural random model for the generation of functions, we quantify the recoverability conditions by deriving bounds on the sparsity (support size) for which the function satisfies the sufficient conditions with a high probability as $n \to \infty$. $\ell_0$ optimization is computationally hard. Therefore, the popular compressive sensing literature considers solving the convex relaxation, $\ell_1$ optimization, to find the sparsest solution. However, we show that $\ell_1$ optimization fails to recover a function (even with constant sparsity) generated using the random model with a high probability as $n \to \infty$. In order to overcome this problem, we propose a novel iterative algorithm for the recovery of functions that satisfy the sufficient conditions. Finally, using an Information Theoretic framework, we study necessary conditions for exact recovery to be possible.

preprint2011arXiv

Sparse Choice Models

Choice models, which capture popular preferences over objects of interest, play a key role in making decisions whose eventual outcome is impacted by human choice behavior. In most scenarios, the choice model, which can effectively be viewed as a distribution over permutations, must be learned from observed data. The observed data, in turn, may frequently be viewed as (partial, noisy) information about marginals of this distribution over permutations. As such, the search for an appropriate choice model boils down to learning a distribution over permutations that is (near-)consistent with observed information about this distribution. In this work, we pursue a non-parametric approach which seeks to learn a choice model (i.e. a distribution over permutations) with {\em sparsest} possible support, and consistent with observed data. We assume that the data observed consists of noisy information pertaining to the marginals of the choice model we seek to learn. We establish that {\em any} choice model admits a `very' sparse approximation in the sense that there exists a choice model whose support is small relative to the dimension of the observed data and whose marginals approximately agree with the observed marginal information. We further show that under, what we dub, `signature' conditions, such a sparse approximation can be found in a computationally efficiently fashion relative to a brute force approach. An empirical study using the American Psychological Association election data-set suggests that our approach manages to unearth useful structural properties of the underlying choice model using the sparse approximation found. Our results further suggest that the signature condition is a potential alternative to the recently popularized Restricted Null Space condition for efficient recovery of sparse models.

preprint2010arXiv

A Simple Message-Passing Algorithm for Compressed Sensing

We consider the recovery of a nonnegative vector x from measurements y = Ax, where A is an m-by-n matrix whos entries are in {0, 1}. We establish that when A corresponds to the adjacency matrix of a bipartite graph with sufficient expansion, a simple message-passing algorithm produces an estimate \hat{x} of x satisfying ||x-\hat{x}||_1 \leq O(n/k) ||x-x(k)||_1, where x(k) is the best k-sparse approximation of x. The algorithm performs O(n (log(n/k))^2 log(k)) computation in total, and the number of measurements required is m = O(k log(n/k)). In the special case when x is k-sparse, the algorithm recovers x exactly in time O(n log(n/k) log(k)). Ultimately, this work is a further step in the direction of more formally developing the broader role of message-passing algorithms in solving compressed sensing problems.

preprint2010arXiv

Efficient Queue-based CSMA with Collisions

Recently there has been considerable interest in the design of efficient carrier sense multiple access(CSMA) protocol for wireless network. The basic assumption underlying recent results is availability of perfect carrier sense information. This allows for design of continuous time algorithm under which collisions are avoided. The primary purpose of this note is to show how these results can be extended in the case when carrier sense information may not be perfect, or equivalently delayed. Specifically, an adaptation of algorithm in Rajagopalan, Shah, Shin (2009) is presented here for time slotted setup with carrier sense information available only at the end of the time slot. To establish its throughput optimality, in additon to method developed in Rajagopalan, Shah, Shin (2009), understanding properties of stationary distribution of a certain non-reversible Markov chain as well as bound on its mixing time is essential. This note presents these key results. A longer version of this note will provide detailed account of how this gets incorporated with methods of Rajagopalan, Shah, Shin (2009) to provide the positive recurrence of underlying network Markov process. In addition, these results will help design optimal rate control in conjunction with CSMA in presence of collision building upon the method of Jiang, Shah, Shin, Walrand (2009).

preprint2010arXiv

Fair Scheduling in Networks Through Packet Election

We consider the problem of designing a fair scheduling algorithm for discrete-time constrained queuing networks. Each queue has dedicated exogenous packet arrivals. There are constraints on which queues can be served simultaneously. This model effectively describes important special instances like network switches, interference in wireless networks, bandwidth sharing for congestion control and traffic scheduling in road roundabouts. Fair scheduling is required because it provides isolation to different traffic flows; isolation makes the system more robust and enables providing quality of service. Existing work on fairness for constrained networks concentrates on flow based fairness. As a main result, we describe a notion of packet based fairness by establishing an analogy with the ranked election problem: packets are voters, schedules are candidates and each packet ranks the schedules based on its priorities. We then obtain a scheduling algorithm that achieves the described notion of fairness by drawing upon the seminal work of Goodman and Markowitz (1952). This yields the familiar Maximum Weight (MW) style algorithm. As another important result we prove that algorithm obtained is throughput optimal. There is no reason a priori why this should be true, and the proof requires non-traditional methods.

preprint2010arXiv

On the Flow-level Dynamics of a Packet-switched Network

The packet is the fundamental unit of transportation in modern communication networks such as the Internet. Physical layer scheduling decisions are made at the level of packets, and packet-level models with exogenous arrival processes have long been employed to study network performance, as well as design scheduling policies that more efficiently utilize network resources. On the other hand, a user of the network is more concerned with end-to-end bandwidth, which is allocated through congestion control policies such as TCP. Utility-based flow-level models have played an important role in understanding congestion control protocols. In summary, these two classes of models have provided separate insights for flow-level and packet-level dynamics of a network.

preprint2010arXiv

Qualitative Properties of alpha-Weighted Scheduling Policies

We consider a switched network, a fairly general constrained queueing network model that has been used successfully to model the detailed packet-level dynamics in communication networks, such as input-queued switches and wireless networks. The main operational issue in this model is that of deciding which queues to serve, subject to certain constraints. In this paper, we study qualitative performance properties of the well known $α$-weighted scheduling policies. The stability, in the sense of positive recurrence, of these policies has been well understood. We establish exponential upper bounds on the tail of the steady-state distribution of the backlog. Along the way, we prove finiteness of the expected steady-state backlog when $α<1$, a property that was known only for $α\geq 1$. Finally, we analyze the excursions of the maximum backlog over a finite time horizon for $α\geq 1$. As a consequence, for $α\geq 1$, we establish the full state space collapse property.

preprint2010arXiv

Randomized Scheduling Algorithm for Queueing Networks

There has recently been considerable interest in design of low-complexity, myopic, distributed and stable scheduling policies for constrained queueing network models that arise in the context of emerging communication networks. Here, we consider two representative models. One, a model for the collection of wireless nodes communicating through a shared medium, that represents randomly varying number of packets in the queues at the nodes of networks. Two, a buffered circuit switched network model for an optical core of future Internet, to capture the randomness in calls or flows present in the network. The maximum weight scheduling policy proposed by Tassiulas and Ephremide in 1992 leads to a myopic and stable policy for the packet-level wireless network model. But computationally it is very expensive (NP-hard) and centralized. It is not applicable to the buffered circuit switched network due to the requirement of non-premption of the calls in the service. As the main contribution of this paper, we present a stable scheduling algorithm for both of these models. The algorithm is myopic, distributed and performs few logical operations at each node per unit time.

preprint2010arXiv

Rumors in a Network: Who's the Culprit?

We provide a systematic study of the problem of finding the source of a rumor in a network. We model rumor spreading in a network with a variant of the popular SIR model and then construct an estimator for the rumor source. This estimator is based upon a novel topological quantity which we term \textbf{rumor centrality}. We establish that this is an ML estimator for a class of graphs. We find the following surprising threshold phenomenon: on trees which grow faster than a line, the estimator always has non-trivial detection probability, whereas on trees that grow like a line, the detection probability will go to 0 as the network grows. Simulations performed on synthetic networks such as the popular small-world and scale-free networks, and on real networks such as an internet AS network and the U.S. electric power grid network, show that the estimator either finds the source exactly or within a few hops of the true source across different network topologies. We compare rumor centrality to another common network centrality notion known as distance centrality. We prove that on trees, the rumor center and distance center are equivalent, but on general networks, they may differ. Indeed, simulations show that rumor centrality outperforms distance centrality in finding rumor sources in networks which are not tree-like.

preprint2010arXiv

The Balanced Unicast and Multicast Capacity Regions of Large Wireless Networks

We consider the question of determining the scaling of the $n^2$-dimensional balanced unicast and the $n 2^n$-dimensional balanced multicast capacity regions of a wireless network with $n$ nodes placed uniformly at random in a square region of area $n$ and communicating over Gaussian fading channels. We identify this scaling of both the balanced unicast and multicast capacity regions in terms of $Θ(n)$, out of $2^n$ total possible, cuts. These cuts only depend on the geometry of the locations of the source nodes and their destination nodes and the traffic demands between them, and thus can be readily evaluated. Our results are constructive and provide optimal (in the scaling sense) communication schemes.

preprint2009arXiv

On Capacity Scaling in Arbitrary Wireless Networks

In recent work, Ozgur, Leveque, and Tse (2007) obtained a complete scaling characterization of throughput scaling for random extended wireless networks (i.e., $n$ nodes are placed uniformly at random in a square region of area $n$). They showed that for small path-loss exponents $α\in(2,3]$ cooperative communication is order optimal, and for large path-loss exponents $α> 3$ multi-hop communication is order optimal. However, their results (both the communication scheme and the proof technique) are strongly dependent on the regularity induced with high probability by the random node placement. In this paper, we consider the problem of characterizing the throughput scaling in extended wireless networks with arbitrary node placement. As a main result, we propose a more general novel cooperative communication scheme that works for arbitrarily placed nodes. For small path-loss exponents $α\in (2,3]$, we show that our scheme is order optimal for all node placements, and achieves exactly the same throughput scaling as in Ozgur et al. This shows that the regularity of the node placement does not affect the scaling of the achievable rates for $α\in (2,3]$. The situation is, however, markedly different for large path-loss exponents $α>3$. We show that in this regime the scaling of the achievable per-node rates depends crucially on the regularity of the node placement. We then present a family of schemes that smoothly "interpolate" between multi-hop and cooperative communication, depending upon the level of regularity in the node placement. We establish order optimality of these schemes under adversarial node placement for $α> 3$.

preprint2008arXiv

Adaptive Alternating Minimization Algorithms

The classical alternating minimization (or projection) algorithm has been successful in the context of solving optimization problems over two variables. The iterative nature and simplicity of the algorithm has led to its application to many areas such as signal processing, information theory, control, and finance. A general set of sufficient conditions for the convergence and correctness of the algorithm is quite well-known when the underlying problem parameters are fixed. In many practical situations, however, the underlying problem parameters are changing over time, and the use of an adaptive algorithm is more appropriate. In this paper, we study such an adaptive version of the alternating minimization algorithm. As a main result of this paper, we provide a general set of sufficient conditions for the convergence and correctness of the adaptive algorithm. Perhaps surprisingly, these conditions seem to be the minimal ones one would expect in such an adaptive setting. We present applications of our results to adaptive decomposition of mixtures, adaptive log-optimal portfolio selection, and adaptive filter design.

preprint2008arXiv

ARQ for Network Coding

A new coding and queue management algorithm is proposed for communication networks that employ linear network coding. The algorithm has the feature that the encoding process is truly online, as opposed to a block-by-block approach. The setup assumes a packet erasure broadcast channel with stochastic arrivals and full feedback, but the proposed scheme is potentially applicable to more general lossy networks with link-by-link feedback. The algorithm guarantees that the physical queue size at the sender tracks the backlog in degrees of freedom (also called the virtual queue size). The new notion of a node "seeing" a packet is introduced. In terms of this idea, our algorithm may be viewed as a natural extension of ARQ schemes to coded networks. Our approach, known as the drop-when-seen algorithm, is compared with a baseline queuing approach called drop-when-decoded. It is shown that the expected queue size for our approach is $O(\frac1{1-ρ})$ as opposed to $Ω(\frac1{(1-ρ)^2})$ for the baseline approach, where $ρ$ is the load factor.

preprint2008arXiv

Message-passing for Maximum Weight Independent Set

We investigate the use of message-passing algorithms for the problem of finding the max-weight independent set (MWIS) in a graph. First, we study the performance of the classical loopy max-product belief propagation. We show that each fixed point estimate of max-product can be mapped in a natural way to an extreme point of the LP polytope associated with the MWIS problem. However, this extreme point may not be the one that maximizes the value of node weights; the particular extreme point at final convergence depends on the initialization of max-product. We then show that if max-product is started from the natural initialization of uninformative messages, it always solves the correct LP -- if it converges. This result is obtained via a direct analysis of the iterative algorithm, and cannot be obtained by looking only at fixed points. The tightness of the LP relaxation is thus necessary for max-product optimality, but it is not sufficient. Motivated by this observation, we show that a simple modification of max-product becomes gradient descent on (a convexified version of) the dual of the LP, and converges to the dual optimum. We also develop a message-passing algorithm that recovers the primal MWIS solution from the output of the descent algorithm. We show that the MWIS estimate obtained using these two algorithms in conjunction is correct when the graph is bipartite and the MWIS is unique. Finally, we show that any problem of MAP estimation for probability distributions over finite domains can be reduced to an MWIS problem. We believe this reduction will yield new insights and algorithms for MAP estimation.

preprint2008arXiv

Network coding meets TCP

We propose a mechanism that incorporates network coding into TCP with only minor changes to the protocol stack, thereby allowing incremental deployment. In our scheme, the source transmits random linear combinations of packets currently in the congestion window. At the heart of our scheme is a new interpretation of ACKs - the sink acknowledges every degree of freedom (i.e., a linear combination that reveals one unit of new information) even if it does not reveal an original packet immediately. Such ACKs enable a TCP-like sliding-window approach to network coding. Our scheme has the nice property that packet losses are essentially masked from the congestion control algorithm. Our algorithm therefore reacts to packet drops in a smooth manner, resulting in a novel and effective approach for congestion control over networks involving lossy links such as wireless links. Our experiments show that our algorithm achieves higher throughput compared to TCP in the presence of lossy wireless links. We also establish the soundness and fairness properties of our algorithm.

preprint2007arXiv

Maximum Weight Matching via Max-Product Belief Propagation

Max-product "belief propagation" is an iterative, local, message-passing algorithm for finding the maximum a posteriori (MAP) assignment of a discrete probability distribution specified by a graphical model. Despite the spectacular success of the algorithm in many application areas such as iterative decoding, computer vision and combinatorial optimization which involve graphs with many cycles, theoretical results about both correctness and convergence of the algorithm are known in few cases (Weiss-Freeman Wainwright, Yeddidia-Weiss-Freeman, Richardson-Urbanke}. In this paper we consider the problem of finding the Maximum Weight Matching (MWM) in a weighted complete bipartite graph. We define a probability distribution on the bipartite graph whose MAP assignment corresponds to the MWM. We use the max-product algorithm for finding the MAP of this distribution or equivalently, the MWM on the bipartite graph. Even though the underlying bipartite graph has many short cycles, we find that surprisingly, the max-product algorithm always converges to the correct MAP assignment as long as the MAP assignment is unique. We provide a bound on the number of iterations required by the algorithm and evaluate the computational cost of the algorithm. We find that for a graph of size $n$, the computational cost of the algorithm scales as $O(n^3)$, which is the same as the computational cost of the best known algorithm. Finally, we establish the precise relation between the max-product algorithm and the celebrated {\em auction} algorithm proposed by Bertsekas. This suggests possible connections between dual algorithm and max-product algorithm for discrete optimization problems.

preprint2007arXiv

Product Multicommodity Flow in Wireless Networks

We provide a tight approximate characterization of the $n$-dimensional product multicommodity flow (PMF) region for a wireless network of $n$ nodes. Separate characterizations in terms of the spectral properties of appropriate network graphs are obtained in both an information theoretic sense and for a combinatorial interference model (e.g., Protocol model). These provide an inner approximation to the $n^2$ dimensional capacity region. These results answer the following questions which arise naturally from previous work: (a) What is the significance of $1/\sqrt{n}$ in the scaling laws for the Protocol interference model obtained by Gupta and Kumar (2000)? (b) Can we obtain a tight approximation to the "maximum supportable flow" for node distributions more general than the geometric random distribution, traffic models other than randomly chosen source-destination pairs, and under very general assumptions on the channel fading model? We first establish that the random source-destination model is essentially a one-dimensional approximation to the capacity region, and a special case of product multi-commodity flow. Building on previous results, for a combinatorial interference model given by a network and a conflict graph, we relate the product multicommodity flow to the spectral properties of the underlying graphs resulting in computational upper and lower bounds. For the more interesting random fading model with additive white Gaussian noise (AWGN), we show that the scaling laws for PMF can again be tightly characterized by the spectral properties of appropriately defined graphs. As an implication, we obtain computationally efficient upper and lower bounds on the PMF for any wireless network with a guaranteed approximation factor.

preprint2006arXiv

Network Coding in a Multicast Switch

We consider the problem of serving multicast flows in a crossbar switch. We show that linear network coding across packets of a flow can sustain traffic patterns that cannot be served if network coding were not allowed. Thus, network coding leads to a larger rate region in a multicast crossbar switch. We demonstrate a traffic pattern which requires a switch speedup if coding is not allowed, whereas, with coding the speedup requirement is eliminated completely. In addition to throughput benefits, coding simplifies the characterization of the rate region. We give a graph-theoretic characterization of the rate region with fanout splitting and intra-flow coding, in terms of the stable set polytope of the 'enhanced conflict graph' of the traffic pattern. Such a formulation is not known in the case of fanout splitting without coding. We show that computing the offline schedule (i.e. using prior knowledge of the flow arrival rates) can be reduced to certain graph coloring problems. Finally, we propose online algorithms (i.e. using only the current queue occupancy information) for multicast scheduling based on our graph-theoretic formulation. In particular, we show that a maximum weighted stable set algorithm stabilizes the queues for all rates within the rate region.

Devavrat Shah

What is connected

Connect this record

See the researcher in context

Building this map preview

60 published item(s)

Causal Inference with Categorical Unobserved Confounder via Mixture Learning

OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries

Federated Optimization of Smooth Loss Functions

Robust Max Entrywise Error Bounds for Tensor Estimation from Sparse Observations via Similarity Based Collaborative Filtering

Current Implicit Policies May Not Eradicate COVID-19

Gradient Descent for Low-Rank Functions

On Multivariate Singular Spectrum Analysis and its Variants

Regret, stability & fairness in matching markets with bandit learners

Unifying Epidemic Models with Mixtures

Time varying regression with hidden linear dynamics

tspDB: Time Series Predict DB

Deconvolution with Unknown Error Distribution Interpreted as Blind Isotonic Regression

Estimation of Skill Distributions

Learning RUMs: Reducing Mixture to Single Component via PCA

Non-Asymptotic Analysis of Monte Carlo Tree Search

On Reinforcement Learning for Turn-based Zero-sum Markov Games

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Stable Reinforcement Learning with Unbounded State Space

Two Burning Questions on COVID-19: Did shutting down the economy help? Can we (partially) reopen the economy without risking the second wave?

Regret Guarantees for Item-Item Collaborative Filtering

A Latent Source Model for Patch-Based Image Segmentation

Approximating the Stationary Probability of a Single State in a Markov chain

Finding Rumor Sources on Random Trees

Rank Centrality: Ranking from Pair-wise Comparisons

A Latent Source Model for Online Collaborative Filtering

Bayesian regression and Bitcoin

Hardness of parameter estimation in graphical models

Learning graphical models from the Glauber dynamics

Learning Mixed Multinomial Logit Model from Ordinal Data

On Queue-Size Scaling for Input-Queued Switches

Statistical inference with probabilistic graphical models

Structure learning of antiferromagnetic Ising models

A Latent Source Model for Nonparametric Time Series Classification

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Partition-Merge: Distributed Inference and Modularity Optimization

Belief Propagation for Min-cost Network Flow: Convergence and Correctness

De-randomizing Shannon: The Design and Analysis of a Capacity-Achieving Rateless Code

Switched networks with maximum weight policies: Fluid approximation and multiplicative state space collapse

A Nonparametric Approach to Modeling Choice with Limited Data

Assortment Optimization Under General Choice

Caching in Wireless Networks

Efficient Distributed Medium Access

Inferring Rankings Using Constrained Sensing

Sparse Choice Models

A Simple Message-Passing Algorithm for Compressed Sensing

Efficient Queue-based CSMA with Collisions

Fair Scheduling in Networks Through Packet Election

On the Flow-level Dynamics of a Packet-switched Network

Qualitative Properties of alpha-Weighted Scheduling Policies

Randomized Scheduling Algorithm for Queueing Networks

Rumors in a Network: Who's the Culprit?

The Balanced Unicast and Multicast Capacity Regions of Large Wireless Networks

On Capacity Scaling in Arbitrary Wireless Networks

Adaptive Alternating Minimization Algorithms

ARQ for Network Coding

Message-passing for Maximum Weight Independent Set

Network coding meets TCP

Maximum Weight Matching via Max-Product Belief Propagation

Product Multicommodity Flow in Wireless Networks

Network Coding in a Multicast Switch