Researcher profile

Mohsen Bayati

Mohsen Bayati contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

A General Theory of the Stochastic Linear Bandit and Its Applications

Recent growing adoption of experimentation in practice has led to a surge of attention to multiarmed bandits as a technique to reduce the opportunity cost of online experiments. In this setting, a decision-maker sequentially chooses among a set of given actions, observes their noisy rewards, and aims to maximize her cumulative expected reward (or minimize regret) over a horizon of length $T$. In this paper, we introduce a general analysis framework and a family of algorithms for the stochastic linear bandit problem that includes well-known algorithms such as the optimism-in-the-face-of-uncertainty-linear-bandit (OFUL) and Thompson sampling (TS) as special cases. Our analysis technique bridges several streams of prior literature and yields a number of new results. First, our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL. SG utilizes the data to discard actions with relatively low uncertainty and then choosing one among the remaining actions greedily. In addition to proving that SG is theoretically rate optimal, our empirical simulations show that SG outperforms existing benchmarks such as greedy, OFUL, and TS. The second application of our general framework is (to the best of our knowledge) the first polylogarithmic (in $T$) regret bounds for OFUL and TS, under similar conditions as the ones by Goldenshluger and Zeevi (2013). Finally, we obtain sharper regret bounds for the $k$-armed contextual MABs by a factor of $\sqrt{k}$.

preprint2022arXiv

Matrix Completion Methods for Causal Panel Data Models

In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the "missing" elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.

preprint2022arXiv

The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

In this note, we introduce a general version of the well-known elliptical potential lemma that is a widely used technique in the analysis of algorithms in sequential learning and decision-making problems. We consider a stochastic linear bandit setting where a decision-maker sequentially chooses among a set of given actions, observes their noisy rewards, and aims to maximize her cumulative expected reward over a decision-making horizon. The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian. Our general elliptical potential lemma relaxes this Gaussian requirement which is a highly non-trivial extension for a number of reasons; unlike the Gaussian case, there is no closed-form solution for the covariance matrix of the posterior distribution, the covariance matrix is not a deterministic function of the actions, and the covariance matrix is not decreasing with respect to the semidefinite inequality. While this result is of broad interest, we showcase an application of it to prove an improved Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.

preprint2020arXiv

Mostly Exploration-Free Algorithms for Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB).