Source author record

Sébastien Bubeck

Sébastien Bubeck appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory math.OC math.PR Data Structures and Algorithms Social and Information Networks Discrete Mathematics math.CO Numerical Analysis Information Theory math.IT math.MG Computational Complexity Computer Science and Game Theory Multiagent Systems Systems and Control

Catalog footprint

What is connected

40works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

First-Order Bayesian Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior over the adversary's sequence of losses. In this framework, Russo and Van Roy proposed an information-theoretic analysis of Thompson Sampling based on the information ratio, resulting in optimal worst-case regret bounds. In this paper we introduce three novel ideas to this line of work. First we propose a new quantity, the scale-sensitive information ratio, which allows us to obtain more refined first-order regret bounds (i.e., bounds of the form $\sqrt{L^*}$ where $L^*$ is the loss of the best combinatorial action). Second we replace the entropy over combinatorial actions by a coordinate entropy, which allows us to obtain the first optimal worst-case bound for Thompson Sampling in the combinatorial setting. Finally, we introduce a novel link between Bayesian agents and frequentist confidence intervals. Combining these ideas we show that the classical multi-armed bandit first-order regret bound $\tilde{O}(\sqrt{d L^*})$ still holds true in the more challenging and more general semi-bandit scenario. This latter result improves the previous state of the art bound $\tilde{O}(\sqrt{(d+m^3)L^*})$ by Lykouris, Sridharan and Tardos. Moreover we sharpen these results with two technical ingredients. The first leverages a recent insight of Zimmert and Lattimore to replace Shannon entropy with more refined potential functions in the analysis. The second is a \emph{Thresholded} Thompson sampling algorithm, which slightly modifies the original algorithm by never playing low-probability actions. This thresholding results in fully $T$-independent regret bounds when $L^*$ is almost surely upper-bounded, which we show does not hold for ordinary Thompson sampling.

preprint2021arXiv

Complexity of Highly Parallel Non-Smooth Convex Optimization

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

preprint2020arXiv

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

preprint2020arXiv

How to trap a gradient flow

We consider the problem of finding an $\varepsilon$-approximate stationary point of a smooth function on a compact domain of $\mathbb{R}^d$. In contrast with dimension-free approaches such as gradient descent, we focus here on the case where $d$ is finite, and potentially small. This viewpoint was explored in 1993 by Vavasis, who proposed an algorithm which, for any fixed finite dimension $d$, improves upon the $O(1/\varepsilon^2)$ oracle complexity of gradient descent. For example for $d=2$, Vavasis' approach obtains the complexity $O(1/\varepsilon)$. Moreover for $d=2$ he also proved a lower bound of $Ω(1/\sqrt{\varepsilon})$ for deterministic algorithms (we extend this result to randomized algorithms). Our main contribution is an algorithm, which we call gradient flow trapping (GFT), and the analysis of its oracle complexity. In dimension $d=2$, GFT closes the gap with Vavasis' lower bound (up to a logarithmic factor), as we show that it has complexity $O\left(\sqrt{\frac{\log(1/\varepsilon)}{\varepsilon}}\right)$. In dimension $d=3$, we show a complexity of $O\left(\frac{\log(1/\varepsilon)}{\varepsilon}\right)$, improving upon Vavasis' $O\left(1 / \varepsilon^{1.2} \right)$. In higher dimensions, GFT has the remarkable property of being a logarithmic parallel depth strategy, in stark contrast with the polynomial depth of gradient descent or Vavasis' algorithm. In this higher dimensional regime, the total work of GFT improves quadratically upon the only other known polylogarithmic depth strategy for this problem, namely naive grid search. We augment this result with another algorithm, named \emph{cut and flow} (CF), which improves upon Vavasis' algorithm in any fixed dimension.

preprint2020arXiv

Metrical Service Systems with Transformations

We consider a generalization of the fundamental online metrical service systems (MSS) problem where the feasible region can be transformed between requests. In this problem, which we call T-MSS, an algorithm maintains a point in a metric space and has to serve a sequence of requests. Each request is a map (transformation) $f_t\colon A_t\to B_t$ between subsets $A_t$ and $B_t$ of the metric space. To serve it, the algorithm has to go to a point $a_t\in A_t$, paying the distance from its previous position. Then, the transformation is applied, modifying the algorithm's state to $f_t(a_t)$. Such transformations can model, e.g., changes to the environment that are outside of an algorithm's control, and we therefore do not charge any additional cost to the algorithm when the transformation is applied. The transformations also allow to model requests occurring in the $k$-taxi problem. We show that for $α$-Lipschitz transformations, the competitive ratio is $Θ(α)^{n-2}$ on $n$-point metrics. Here, the upper bound is achieved by a deterministic algorithm and the lower bound holds even for randomized algorithms. For the $k$-taxi problem, we prove a competitive ratio of $\tilde O((n\log k)^2)$. For chasing convex bodies, we show that even with contracting transformations no competitive algorithm exists. The problem T-MSS has a striking connection to the following deep mathematical question: Given a finite metric space $M$, what is the required cardinality of an extension $\hat M\supseteq M$ where each partial isometry on $M$ extends to an automorphism? We give partial answers for special cases.

preprint2020arXiv

Online Learning for Active Cache Synchronization

Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated. This paper introduces synchronization bandits, a MAB variant where all arms generate costs at all times, but the agent observes an arm's instantaneous cost only when the arm is played. Synchronization MABs are inspired by online caching scenarios such as Web crawling, where an arm corresponds to a cached item and playing the arm means downloading its fresh copy from a server. We present MirrorSync, an online learning algorithm for synchronization bandits, establish an adversarial regret of $O(T^{2/3})$ for it, and show how to make it practical.

preprint2020arXiv

Online Multiserver Convex Chasing and Optimization

We introduce the problem of $k$-chasing of convex functions, a simultaneous generalization of both the famous k-server problem in $R^d$, and of the problem of chasing convex bodies and functions. Aside from fundamental interest in this general form, it has natural applications to online $k$-clustering problems with objectives such as $k$-median or $k$-means. We show that this problem exhibits a rich landscape of behavior. In general, if both $k > 1$ and $d > 1$ there does not exist any online algorithm with bounded competitiveness. By contrast, we exhibit a class of nicely behaved functions (which include in particular the above-mentioned clustering problems), for which we show that competitive online algorithms exist, and moreover with dimension-free competitive ratio. We also introduce a parallel question of top-$k$ action regret minimization in the realm of online convex optimization. There, too, a much rougher landscape emerges for $k > 1$. While it is possible to achieve vanishing regret, unlike the top-one action case the rate of vanishing does not speed up for strongly convex functions. Moreover, vanishing regret necessitates both intractable computations and randomness. Finally we leave open whether almost dimension-free regret is achievable for $k > 1$ and general convex losses. As evidence that it might be possible, we prove dimension-free regret for linear losses via an information-theoretic argument.

preprint2016arXiv

Asymptotic behavior of the Eden model with positively homogeneous edge weights

Let $d\in\mathbb N$, $α\in\mathbb R$, and let $f :\mathbb R^d\setminus \{0\} \rightarrow (0,\infty)$ be locally Lipschitz and positively homogeneous of degree $α$ (e.g. $f$ could be the $α$th power of a norm on $\mathbb R^d$). We study a generalization of the Eden model on $\mathbb Z^d$ wherein the next edge added to the cluster is chosen from the set of all edges incident to the current cluster with probability proportional to the value of $f$ at the midpoint of this edge, rather than uniformly. This model is equivalent to a variant of first passage percolation where the edge passage times are independent exponential random variables with parameters given by the value of $f$ at the midpoint of the edge. We prove that the $f$-weighted Eden model clusters have an a.s. deterministic limit shape if $α< 1$, which is an explicit functional of $f$ and the limit shape of the standard Eden model, and estimate the rate of convergence to this limit shape. We also prove that if $α>1$, then there is a norm $ν$ on $\mathbb R^d$ (depending on $α$) such that if we set $f(z) = ν(z)^{ α}$, then the $f$-weighted Eden model clusters are a.s.\ contained in a Euclidean cone with opening angle $<π$ for all time. We further show that there does \emph{not} exist a norm on $\mathbb R^d$ for which this latter statement holds for all $α>1$; and that there is no choice of function $f$ for which the above statement holds with $α=1$. Our basic approach is to compare the local behavior of the $f$-weighted first passage percolation to that of unweighted first passage percolation with iid exponential edge weights (which is equivalent to the unweighted Eden model). We include a list of open problems and several computer simulations.

preprint2016arXiv

Basic models and questions in statistical network analysis

Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more.

preprint2016arXiv

Black-box optimization with a politician

We propose a new framework for black-box convex optimization which is well-suited for situations where gradient computations are expensive. We derive a new method for this framework which leverages several concepts from convex optimization, from standard first-order methods (e.g. gradient descent or quasi-Newton methods) to analytical centers (i.e. minimizers of self-concordant barriers). We demonstrate empirically that our new technique compares favorably with state of the art algorithms (such as BFGS).

preprint2016arXiv

Kernel-based methods for bandit convex optimization

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves $\tilde{O}(n^{9.5} \sqrt{T})$-regret, and we show that a simple variant of this algorithm can be run in $\mathrm{poly}(n \log(T))$-time per step at the cost of an additional $\mathrm{poly}(n) T^{o(1)}$ factor in the regret. These results improve upon the $\tilde{O}(n^{11} \sqrt{T})$-regret and $\exp(\mathrm{poly}(T))$-time result of the first two authors, and the $\log(T)^{\mathrm{poly}(n)} \sqrt{T}$-regret and $\log(T)^{\mathrm{poly}(n)}$-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve $\tilde{O}(n^{1.5} \sqrt{T})$-regret, and moreover that this regret is unimprovable (the current best lower bound being $Ω(n \sqrt{T})$ and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order $n^3 / ε^2$.

preprint2016arXiv

On paths, stars and wyes in trees

We further the study of local profiles of trees. Bubeck and Linial showed that the set of 5-profiles contains a certain polytope, namely the convex hull of d-millipedes, and they proved that the segment [0-millipede, 1-millipede] corresponds to a face of the set of 5-profiles. Our main result shows that the segment [1-millipede, 2-millipede] also corresponds to a face. Surprisingly we also show that for d > 3 the segment [d-millipede, (d+1)-millipede] is not a face of the set of 5-profiles. We do so by exhibiting new trees which are generalized millipedes with intriguing patterns for their degree sequence. The plot thickens, and the set of 5-profiles remains a mysterious convex set.

preprint2015arXiv

A geometric alternative to Nesterov's accelerated gradient descent

We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov's accelerated gradient descent. The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method. We provide some numerical evidence that the new method can be superior to Nesterov's accelerated gradient descent.

preprint2015arXiv

Bandit Convex Optimization: sqrt{T} Regret in One Dimension

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is $\widetildeΘ(\sqrt{T})$ and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a "local-to-global" property of convex functions, that may be of independent interest.

preprint2015arXiv

Convex Optimization: Algorithms and Complexity

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by Nesterov's seminal book and Nemirovski's lecture notes, includes the analysis of cutting plane methods, as well as (accelerated) gradient descent schemes. We also pay special attention to non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirror descent, and dual averaging) and discuss their relevance in machine learning. We provide a gentle introduction to structural optimization with FISTA (to optimize a sum of a smooth and a simple non-smooth term), saddle-point mirror prox (Nemirovski's alternative to Nesterov's smoothing), and a concise description of interior point methods. In stochastic optimization we discuss stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms. We also briefly touch upon convex relaxation of combinatorial problems and the use of randomness to round solutions, as well as random walks based methods.

preprint2015arXiv

Detecting Markov Random Fields Hidden in White Noise

Motivated by change point problems in time series and the detection of textured objects in images, we consider the problem of detecting a piece of a Gaussian Markov random field hidden in white Gaussian noise. We derive minimax lower bounds and propose near-optimal tests.

preprint2015arXiv

Detecting positive correlations in a multivariate sample

We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study the performance of some near-optimal tests. We pay special attention to computational feasibility and construct near-optimal tests that can be computed efficiently. Finally, we apply our results to prove new lower bounds for the clique number of high-dimensional random geometric graphs.

preprint2015arXiv

Exceptional rotations of random graphs: a VC theory

In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random graphs indexed by unit vectors in $\mathbb{R}^d$. We investigate the deviations of the process with respect to three fundamental properties: clique number, chromatic number, and connectivity. In all cases we establish upper and lower bounds for the minimal dimension $d$ that guarantees the existence of "exceptional directions" in which the random graph behaves atypically with respect to the property. For each of the three properties, four theorems are established, to describe upper and lower bounds for the threshold dimension in the subcritical and supercritical regimes.

preprint2015arXiv

Finding Adam in random growing trees

We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of $K$ vertices, such that, with probability at least $1-ε$, the first vertex is in this set. We show that for any $ε$, there exist such algorithms with $K$ independent of the size of the input tree. Moreover, we provide almost tight bounds for the best value of $K$ as a function of $ε$. In the uniform attachment case we show that the optimal $K$ is subpolynomial in $1/ε$, and that it has to be at least superpolylogarithmic. On the other hand, the preferential attachment case is exponentially harder, as we prove that the best $K$ is polynomial in $1/ε$. We conclude the paper with several open problems.

preprint2015arXiv

Multi-scale exploration of convex functions and bandit convex optimization

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function. We use this map to solve a decade-old open problem in adversarial bandit convex optimization by showing that the minimax regret for this problem is $\tilde{O}(\mathrm{poly}(n) \sqrt{T})$, where $n$ is the dimension and $T$ the number of rounds. This bound is obtained by studying the dual Bayesian maximin regret via the information ratio analysis of Russo and Van Roy, and then using the multi-scale exploration to solve the Bayesian problem.

preprint2015arXiv

Sampling from a log-concave distribution with Projected Langevin Monte Carlo

We extend the Langevin Monte Carlo (LMC) algorithm to compactly supported measures via a projection step, akin to projected Stochastic Gradient Descent (SGD). We show that (projected) LMC allows to sample in polynomial time from a log-concave distribution with smooth potential. This gives a new Markov chain to sample from a log-concave distribution. Our main result shows in particular that when the target distribution is uniform, LMC mixes in $\tilde{O}(n^7)$ steps (where $n$ is the dimension). We also provide preliminary experimental evidence that LMC performs at least as well as hit-and-run, for which a better mixing time of $\tilde{O}(n^4)$ was proved by Lov{á}sz and Vempala.

preprint2015arXiv

Testing for high-dimensional geometry in random graphs

We study the problem of detecting the presence of an underlying high-dimensional geometric structure in a random graph. Under the null hypothesis, the observed graph is a realization of an Erdős-Rényi random graph $G(n,p)$. Under the alternative, the graph is generated from the $G(n,p,d)$ model, where each vertex corresponds to a latent independent random vector uniformly distributed on the sphere $\mathbb{S}^{d-1}$, and two vertices are connected if the corresponding latent vectors are close enough. In the dense regime (i.e., $p$ is a constant), we propose a near-optimal and computationally efficient testing procedure based on a new quantity which we call signed triangles. The proof of the detection lower bound is based on a new bound on the total variation distance between a Wishart matrix and an appropriately normalized GOE matrix. In the sparse regime, we make a conjecture for the optimal detection boundary. We conclude the paper with some preliminary steps on the problem of estimating the dimension in $G(n,p,d)$.

preprint2015arXiv

The entropic barrier: a simple and optimal universal self-concordant barrier

We prove that the Cramér transform of the uniform measure on a convex body in $\mathbb{R}^n$ is a $(1+o(1)) n$-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski. This gives the first explicit construction of a universal barrier for convex bodies with optimal self-concordance parameter. The proof is based on basic geometry of log-concave distributions, and elementary duality in exponential families.

preprint2014arXiv

From trees to seeds: on the inference of the seed from large trees in the uniform attachment model

We study the influence of the seed in random trees grown according to the uniform attachment model, also known as uniform random recursive trees. We show that different seeds lead to different distributions of limiting trees from a total variation point of view. To do this, we construct statistics that measure, in a certain well-defined sense, global "balancedness" properties of such trees. Our paper follows recent results on the same question for the preferential attachment model.

preprint2014arXiv

Most Correlated Arms Identification

We study the problem of finding the most mutually correlated arms among many arms. We show that adaptive arms sampling strategies can have significant advantages over the non-adaptive uniform sampling strategy. Our proposed algorithms rely on a novel correlation estimator. The use of this accurate estimator allows us to get improved results for a wide range of problem instances.

preprint2014arXiv

On the influence of the seed graph in the preferential attachment model

We study the influence of the seed graph in the preferential attachment model, focusing on the case of trees. We first show that the seed has no effect from a weak local limit point of view. On the other hand, we conjecture that different seeds lead to different distributions of limiting trees from a total variation point of view. We take a first step in proving this conjecture by showing that seeds with different degree profiles lead to different limiting distributions for the (appropriately normalized) maximum degree, implying that such seeds lead to different (in total variation) limiting trees.

preprint2014arXiv

On the local profiles of trees

We study the local profiles of trees. We show that, in contrast with the situation for general graphs, the limit set of k-profiles of trees is convex. We initiate a study of the defining inequalities of this convex set. Many challenging problems remain open.

preprint2013arXiv

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the value $μ^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $Δ$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $Δ$, and bounded regret of order $1/Δ$ is not possible if one only knows $μ^{(\star)}$

preprint2013arXiv

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

preprint2013arXiv

Prior-free and prior-dependent regret bounds for Thompson Sampling

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and prior-dependent regret bounds, very much in the same spirit as the usual distribution-free and distribution-dependent bounds for the non-Bayesian stochastic bandit. Building on the techniques of Audibert and Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by $14 \sqrt{n K}$. This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by $\frac{1}{20} \sqrt{n K}$. We also study the case of priors for the setting of Bubeck et al. [2013] (where the optimal mean is known as well as a lower bound on the smallest gap) and we show that in this case the regret of Thompson Sampling is in fact uniformly bounded over time, thus showing that Thompson Sampling can greatly take advantage of the nice properties of these priors.

preprint2013arXiv

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have achieved by picking, in hindsight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full information, and the partial information models of the so-called "semi-bandit" and "bandit" problems. Combining the Mirror Descent algorithm and the INF (Implicitely Normalized Forecaster) strategy, we are able to prove optimal bounds for the semi-bandit case. We also recover the optimal bounds for the full information setting. In the bandit case we discuss existing results in light of a new lower bound, and suggest a conjecture on the optimal regret in that case. Finally we also prove that the standard exponentially weighted average forecaster is provably suboptimal in the setting of online combinatorial optimization.

preprint2012arXiv

Bandits with heavy tail

The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of order 1+ε, for some $ε\in (0,1]$. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when ε<1.

preprint2012arXiv

Detection of correlations

We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases.

preprint2012arXiv

Multiple Identifications in Multi-Armed Bandits

We study the problem of identifying the top $m$ arms in a multi-armed bandit game. Our proposed solution relies on a new algorithm based on successive rejects of the seemingly bad arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle other multiple identifications settings that were previously out of reach. In particular we show that this idea of successive accepts and rejects applies to the multi-bandit best arm identification problem.

preprint2012arXiv

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.

preprint2012arXiv

Towards minimax policies for online linear optimization with bandit feedback

We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $\sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d \sqrt{n \log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d \sqrt{n}$ regret, thus improving by a factor $\sqrt{d \log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $\sqrt{d n \log n}$ regret, again shaving off an extraneous $\sqrt{d}$ compared to previous works.

preprint2011arXiv

Lipschitz Bandits without the Lipschitz Constant

We consider the setting of stochastic bandit problems with a continuum of arms. We first point out that the strategies considered so far in the literature only provided theoretical guarantees of the form: given some tuning parameters, the regret is small with respect to a class of environments that depends on these parameters. This is however not the right perspective, as it is the strategy that should adapt to the specific bandit environment at hand, and not the other way round. Put differently, an adaptation issue is raised. We solve it for the special case of environments whose mean-payoff functions are globally Lipschitz. More precisely, we show that the minimax optimal orders of magnitude $L^{d/(d+2)} \, T^{(d+1)/(d+2)}$ of the regret bound against an environment $f$ with Lipschitz constant $L$ over $T$ time instances can be achieved without knowing $L$ or $T$ in advance. This is in contrast to all previously known strategies, which require to some extent the knowledge of $L$ to achieve this performance guarantee.

preprint2011arXiv

Optimal discovery with probabilistic expert advice

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turing missing mass estimator. We show that this strategy uniformly attains the optimal discovery rate in a macroscopic limit sense, under some assumptions on the probabilistic experts. We also provide numerical experiments suggesting that this optimal behavior may still hold under weaker assumptions.

preprint2011arXiv

X-Armed Bandits

We consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modified strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches.

preprint2010arXiv

Pure Exploration for Multi-Armed Bandit Problems

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions.

Sébastien Bubeck

What is connected

Connect this record

See the researcher in context

Building this map preview

40 published item(s)

First-Order Bayesian Regret Analysis of Thompson Sampling

Complexity of Highly Parallel Non-Smooth Convex Optimization

Coordination without communication: optimal regret in two players multi-armed bandits

How to trap a gradient flow

Metrical Service Systems with Transformations

Online Learning for Active Cache Synchronization

Online Multiserver Convex Chasing and Optimization

Asymptotic behavior of the Eden model with positively homogeneous edge weights

Basic models and questions in statistical network analysis

Black-box optimization with a politician

Kernel-based methods for bandit convex optimization

On paths, stars and wyes in trees

A geometric alternative to Nesterov's accelerated gradient descent

Bandit Convex Optimization: sqrt{T} Regret in One Dimension

Convex Optimization: Algorithms and Complexity

Detecting Markov Random Fields Hidden in White Noise

Detecting positive correlations in a multivariate sample

Exceptional rotations of random graphs: a VC theory

Finding Adam in random growing trees

Multi-scale exploration of convex functions and bandit convex optimization

Sampling from a log-concave distribution with Projected Langevin Monte Carlo

Testing for high-dimensional geometry in random graphs

The entropic barrier: a simple and optimal universal self-concordant barrier

From trees to seeds: on the inference of the seed from large trees in the uniform attachment model

Most Correlated Arms Identification

On the influence of the seed graph in the preferential attachment model

On the local profiles of trees

Bounded regret in stochastic multi-armed bandits

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Prior-free and prior-dependent regret bounds for Thompson Sampling

Regret in Online Combinatorial Optimization

Bandits with heavy tail

Detection of correlations

Multiple Identifications in Multi-Armed Bandits

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Towards minimax policies for online linear optimization with bandit feedback

Lipschitz Bandits without the Lipschitz Constant

Optimal discovery with probabilistic expert advice

X-Armed Bandits

Pure Exploration for Multi-Armed Bandit Problems