Source author record

Mohsen Bayati

Mohsen Bayati appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT math.ST Statistics Theory Artificial Intelligence Computer Science and Game Theory math.PR Computation Computational Complexity Discrete Mathematics econ.EM math-ph math.MP math.OC Multiagent Systems

Catalog footprint

What is connected

16works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A General Theory of the Stochastic Linear Bandit and Its Applications

Recent growing adoption of experimentation in practice has led to a surge of attention to multiarmed bandits as a technique to reduce the opportunity cost of online experiments. In this setting, a decision-maker sequentially chooses among a set of given actions, observes their noisy rewards, and aims to maximize her cumulative expected reward (or minimize regret) over a horizon of length $T$. In this paper, we introduce a general analysis framework and a family of algorithms for the stochastic linear bandit problem that includes well-known algorithms such as the optimism-in-the-face-of-uncertainty-linear-bandit (OFUL) and Thompson sampling (TS) as special cases. Our analysis technique bridges several streams of prior literature and yields a number of new results. First, our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL. SG utilizes the data to discard actions with relatively low uncertainty and then choosing one among the remaining actions greedily. In addition to proving that SG is theoretically rate optimal, our empirical simulations show that SG outperforms existing benchmarks such as greedy, OFUL, and TS. The second application of our general framework is (to the best of our knowledge) the first polylogarithmic (in $T$) regret bounds for OFUL and TS, under similar conditions as the ones by Goldenshluger and Zeevi (2013). Finally, we obtain sharper regret bounds for the $k$-armed contextual MABs by a factor of $\sqrt{k}$.

preprint2022arXiv

Matrix Completion Methods for Causal Panel Data Models

In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the "missing" elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.

preprint2022arXiv

The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

In this note, we introduce a general version of the well-known elliptical potential lemma that is a widely used technique in the analysis of algorithms in sequential learning and decision-making problems. We consider a stochastic linear bandit setting where a decision-maker sequentially chooses among a set of given actions, observes their noisy rewards, and aims to maximize her cumulative expected reward over a decision-making horizon. The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian. Our general elliptical potential lemma relaxes this Gaussian requirement which is a highly non-trivial extension for a number of reasons; unlike the Gaussian case, there is no closed-form solution for the covariance matrix of the posterior distribution, the covariance matrix is not a deterministic function of the actions, and the covariance matrix is not decreasing with respect to the semidefinite inequality. While this result is of broad interest, we showcase an application of it to prove an improved Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.

preprint2020arXiv

Mostly Exploration-Free Algorithms for Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB).

preprint2016arXiv

Dynamic Pricing with Demand Covariates

We consider a firm that sells products over $T$ periods without knowing the demand function. The firm sequentially sets prices to earn revenue and to learn the underlying demand function simultaneously. A natural heuristic for this problem, commonly used in practice, is greedy iterative least squares (GILS). At each time period, GILS estimates the demand as a linear function of the price by applying least squares to the set of prior prices and realized demands. Then a price that maximizes the revenue, given the estimated demand function, is used for the next time period. The performance is measured by the regret, which is the expected revenue loss from the optimal (oracle) pricing policy when the demand function is known. Recently, den Boer and Zwart (2014) and Keskin and Zeevi (2014) demonstrated that GILS is sub-optimal. They introduced algorithms which integrate forced price dispersion with GILS and achieve asymptotically optimal performance. In this paper, we consider this dynamic pricing problem in a data-rich environment. In particular, we assume that the firm knows the expected demand under a particular price from historical data, and in each period, before setting the price, the firm has access to extra information (demand covariates) which may be predictive of the demand. We prove that in this setting GILS achieves asymptotically optimal regret of order $\log(T)$. We also show the following surprising result: in the original dynamic pricing problem of den Boer and Zwart (2014) and Keskin and Zeevi (2014), inclusion of any set of covariates in GILS as potential demand covariates (even though they could carry no information) would make GILS asymptotically optimal. We validate our results via extensive numerical simulations on synthetic and real data sets.

preprint2016arXiv

Scalable Approximations for Generalized Linear Problems

In stochastic optimization, the population risk is generally approximated by the empirical risk. However, in the large-scale setting, minimization of the empirical risk may be computationally restrictive. In this paper, we design an efficient algorithm to approximate the population risk minimizer in generalized linear problems such as binary classification with surrogate losses and generalized linear regression models. We focus on large-scale problems, where the iterative minimization of the empirical risk is computationally intractable, i.e., the number of observations $n$ is much larger than the dimension of the parameter $p$, i.e. $n \gg p \gg 1$. We show that under random sub-Gaussian design, the true minimizer of the population risk is approximately proportional to the corresponding ordinary least squares (OLS) estimator. Using this relation, we design an algorithm that achieves the same accuracy as the empirical risk minimizer through iterations that attain up to a cubic convergence rate, and that are cheaper than any batch optimization algorithm by at least a factor of $\mathcal{O}(p)$. We provide theoretical guarantees for our algorithm, and analyze the convergence behavior in terms of data dimensions. Finally, we demonstrate the performance of our algorithm on well-known classification and regression problems, through extensive numerical studies on large-scale datasets, and show that it achieves the highest performance compared to several other widely used and specialized optimization algorithms.

preprint2015arXiv

The LASSO risk for gaussian matrices

We consider the problem of learning a coefficient vector x_0\in R^N from noisy linear observation y=Ax_0+w \in R^n. In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x'. In this case, a popular approach consists in solving an L1-penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Simulations on real data matrices suggest that our results can be relevant in a broad array of practical applications.

preprint2015arXiv

Universality in polytope phase transitions and message passing algorithms

We consider a class of nonlinear mappings $\mathsf{F}_{A,N}$ in $\mathbb{R}^N$ indexed by symmetric random matrices $A\in\mathbb{R}^{N\times N}$ with independent entries. Within spin glass theory, special cases of these mappings correspond to iterating the TAP equations and were studied by Bolthausen [Comm. Math. Phys. 325 (2014) 333-366]. Within information theory, they are known as "approximate message passing" algorithms. We study the high-dimensional (large $N$) behavior of the iterates of $\mathsf{F}$ for polynomial functions $\mathsf{F}$, and prove that it is universal; that is, it depends only on the first two moments of the entries of $A$, under a sub-Gaussian tail condition. As an application, we prove the universality of a certain phase transition arising in polytope geometry and compressed sensing. This solves, for a broad class of random projections, a conjecture by David Donoho and Jared Tanner.

preprint2013arXiv

Combinatorial approach to the interpolation method and scaling limits in sparse random graphs

We establish the existence of free energy limits for several combinatorial models on Erdös-Rényi graph $\mathbb {G}(N,\lfloor cN\rfloor)$ and random $r$-regular graph $\mathbb {G}(N,r)$. For a variety of models, including independent sets, MAX-CUT, coloring and K-SAT, we prove that the free energy both at a positive and zero temperature, appropriately rescaled, converges to a limit as the size of the underlying graph diverges to infinity. In the zero temperature case, this is interpreted as the existence of the scaling limit for the corresponding combinatorial optimization problem. For example, as a special case we prove that the size of a largest independent set in these graphs, normalized by the number of nodes converges to a limit w.h.p. This resolves an open problem which was proposed by Aldous (Some open problems) as one of his six favorite open problems. It was also mentioned as an open problem in several other places: Conjecture 2.20 in Wormald [In Surveys in Combinatorics, 1999 (Canterbury) (1999) 239-298 Cambridge Univ. Press]; Bollobás and Riordan [Random Structures Algorithms 39 (2011) 1-38]; Janson and Thomason [Combin. Probab. Comput. 17 (2008) 259-264] and Aldous and Steele [In Probability on Discrete Structures (2004) 1-72 Springer].

preprint2012arXiv

A Sequential Algorithm for Generating Random Graphs

We present a nearly-linear time algorithm for counting and randomly generating simple graphs with a given degree sequence in a certain range. For degree sequence $(d_i)_{i=1}^n$ with maximum degree $d_{\max}=O(m^{1/4-τ})$, our algorithm generates almost uniform random graphs with that degree sequence in time $O(m\,d_{\max})$ where $m=\f{1}{2}\sum_id_i$ is the number of edges in the graph and $τ$ is any positive constant. The fastest known algorithm for uniform generation of these graphs McKay Wormald (1990) has a running time of $O(m^2d_{\max}^2)$. Our method also gives an independent proof of McKay's estimate McKay (1985) for the number of such graphs. We also use sequential importance sampling to derive fully Polynomial-time Randomized Approximation Schemes (FPRAS) for counting and uniformly generating random graphs for the same range of $d_{\max}=O(m^{1/4-τ})$. Moreover, we show that for $d = O(n^{1/2-τ})$, our algorithm can generate an asymptotically uniform $d$-regular graph. Our results improve the previous bound of $d = O(n^{1/3-τ})$ due to Kim and Vu (2004) for regular graphs.

preprint2011arXiv

Bargaining dynamics in exchange networks

We consider a one-sided assignment market or exchange network with transferable utility and propose a model for the dynamics of bargaining in such a market. Our dynamical model is local, involving iterative updates of 'offers' based on estimated best alternative matches, in the spirit of pairwise Nash bargaining. We establish that when a balanced outcome (a generalization of the pairwise Nash bargaining solution to networks) exists, our dynamics converges rapidly to such an outcome. We extend our results to the cases of (i) general agent 'capacity constraints', i.e., an agent may be allowed to participate in multiple matches, and (ii) 'unequal bargaining powers' (where we also find a surprising change in rate of convergence).

preprint2011arXiv

Belief-Propagation for Weighted b-Matchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions

We consider the general problem of finding the minimum weight $\bm$-matching on arbitrary graphs. We prove that, whenever the linear programming (LP) relaxation of the problem has no fractional solutions, then the belief propagation (BP) algorithm converges to the correct solution. We also show that when the LP relaxation has a fractional solution then the BP algorithm can be used to solve the LP relaxation. Our proof is based on the notion of graph covers and extends the analysis of (Bayati-Shah-Sharma 2005 and Huang-Jebara 2007}. These results are notable in the following regards: (1) It is one of a very small number of proofs showing correctness of BP without any constraint on the graph structure. (2) Variants of the proof work for both synchronous and asynchronous BP; it is the first proof of convergence and correctness of an asynchronous BP algorithm for a combinatorial optimization problem.

preprint2011arXiv

Message Passing Algorithms for Sparse Network Alignment

Network alignment generalizes and unifies several approaches for forming a matching or alignment between the vertices of two graphs. We study a mathematical programming framework for network alignment problem and a sparse variation of it where only a small number of matches between the vertices of the two graphs are possible. We propose a new message passing algorithm that allows us to compute, very efficiently, approximate solutions to the sparse network alignment problems with graph sizes as large as hundreds of thousands of vertices. We also provide extensive simulations comparing our algorithms with two of the best solvers for network alignment problems on two synthetic matching problems, two bioinformatics problems, and three large ontology alignment problems including a multilingual problem with a known labeled alignment.

preprint2011arXiv

The dynamics of message passing on dense graphs, with applications to compressed sensing

Approximate message passing algorithms proved to be extremely effective in reconstructing sparse signals from a small number of incoherent linear measurements. Extensive numerical experiments further showed that their dynamics is accurately tracked by a simple one-dimensional iteration termed state evolution. In this paper we provide the first rigorous foundation to state evolution. We prove that indeed it holds asymptotically in the large system limit for sensing matrices with independent and identically distributed gaussian entries. While our focus is on message passing algorithms for compressed sensing, the analysis extends beyond this setting, to a general class of algorithms on dense graphs. In this context, state evolution plays the role that density evolution has for sparse graphs. The proof technique is fundamentally different from the standard approach to density evolution, in that it copes with large number of short loops in the underlying factor graph. It relies instead on a conditioning technique recently developed by Erwin Bolthausen in the context of spin glass theory.

preprint2010arXiv

A Natural Dynamics for Bargaining on Exchange Networks

Bargaining networks model the behavior of a set of players that need to reach pairwise agreements for making profits. Nash bargaining solutions are special outcomes of such games that are both stable and balanced. Kleinberg and Tardos proved a sharp algorithmic characterization of such outcomes, but left open the problem of how the actual bargaining process converges to them. A partial answer was provided by Azar et al. who proposed a distributed algorithm for constructing Nash bargaining solutions, but without polynomial bounds on its convergence rate. In this paper, we introduce a simple and natural model for this process, and study its convergence rate to Nash bargaining solutions. At each time step, each player proposes a deal to each of her neighbors. The proposal consists of a share of the potential profit in case of agreement. The share is chosen to be balanced in Nash's sense as far as this is feasible (with respect to the current best alternatives for both players). We prove that, whenever the Nash bargaining solution is unique (and satisfies a positive gap condition) this dynamics converges to it in polynomial time. Our analysis is based on an approximate decoupling phenomenon between the dynamics on different substructures of the network. This approach may be of general interest for the analysis of local algorithms on networks.

preprint2007arXiv

Maximum Weight Matching via Max-Product Belief Propagation

Max-product "belief propagation" is an iterative, local, message-passing algorithm for finding the maximum a posteriori (MAP) assignment of a discrete probability distribution specified by a graphical model. Despite the spectacular success of the algorithm in many application areas such as iterative decoding, computer vision and combinatorial optimization which involve graphs with many cycles, theoretical results about both correctness and convergence of the algorithm are known in few cases (Weiss-Freeman Wainwright, Yeddidia-Weiss-Freeman, Richardson-Urbanke}. In this paper we consider the problem of finding the Maximum Weight Matching (MWM) in a weighted complete bipartite graph. We define a probability distribution on the bipartite graph whose MAP assignment corresponds to the MWM. We use the max-product algorithm for finding the MAP of this distribution or equivalently, the MWM on the bipartite graph. Even though the underlying bipartite graph has many short cycles, we find that surprisingly, the max-product algorithm always converges to the correct MAP assignment as long as the MAP assignment is unique. We provide a bound on the number of iterations required by the algorithm and evaluate the computational cost of the algorithm. We find that for a graph of size $n$, the computational cost of the algorithm scales as $O(n^3)$, which is the same as the computational cost of the best known algorithm. Finally, we establish the precise relation between the max-product algorithm and the celebrated {\em auction} algorithm proposed by Bertsekas. This suggests possible connections between dual algorithm and max-product algorithm for discrete optimization problems.

Mohsen Bayati

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

A General Theory of the Stochastic Linear Bandit and Its Applications

Matrix Completion Methods for Causal Panel Data Models

The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Mostly Exploration-Free Algorithms for Contextual Bandits

Dynamic Pricing with Demand Covariates

Scalable Approximations for Generalized Linear Problems

The LASSO risk for gaussian matrices

Universality in polytope phase transitions and message passing algorithms

Combinatorial approach to the interpolation method and scaling limits in sparse random graphs

A Sequential Algorithm for Generating Random Graphs

Bargaining dynamics in exchange networks

Belief-Propagation for Weighted b-Matchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions

Message Passing Algorithms for Sparse Network Alignment

The dynamics of message passing on dense graphs, with applications to compressed sensing

A Natural Dynamics for Bargaining on Exchange Networks

Maximum Weight Matching via Max-Product Belief Propagation