Source author record

Wouter M. Koolen

Wouter M. Koolen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Science and Game Theory Genomics Information Theory math.IT Methodology q-fin.PR Quantitative Methods

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+ε)$ are uniformly bounded by a known constant B, for some given $ε> 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

preprint2020arXiv

Lipschitz and Comparator-Norm Adaptivity in Online Learning

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using $O(d)$ time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time $O(d^3)$ per round. We then generalize two prior reductions to the unbounded setting; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work). We discuss their optimality in light of prior and new lower bounds. We apply our methods to obtain sharper regret bounds for scale-invariant online prediction with linear models.

preprint2020arXiv

Structure Adaptive Algorithms for Stochastic Bandits

We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are flexible (in that they easily adapt to different structures), powerful (in that they perform well empirically and/or provably match instance-dependent lower bounds) and efficient in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB.

preprint2016arXiv

Online Isotonic Regression

We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by $O\big(T^{1/3} \log^{2/3}(T)\big)$ and present a matching $Ω(T^{1/3})$ lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to $O(\log T)$ or even to $O(1)$ (when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss.

preprint2016arXiv

Robust Probability Updating

This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution. This paper thus addresses a generalization of that problem to arbitrary distributions on finite outcome spaces, arbitrary sets of `messages', and (almost) arbitrary loss functions, and provides existence and characterization theorems for robust probability updating strategies. We find that for logarithmic loss, optimality is characterized by an elegant condition, which we call RCAR (reverse coarsening at random). Under certain conditions, the same condition also characterizes optimality for a much larger class of loss functions, and we obtain an objective and general answer to how one should update probabilities in the light of new information.

preprint2015arXiv

Second-order Quantile Methods for Experts and Combinatorial Games

We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of 'easy data', which may be paraphrased as "the learning problem has small variance" and "multiple decisions are useful", are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. In this paper we outline a new method for obtaining such adaptive algorithms, based on a potential function that aggregates a range of learning rates (which are essential tuning parameters). By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles.

preprint2013arXiv

Universal Codes from Switching Strategies

We discuss algorithms for combining sequential prediction strategies, a task which can be viewed as a natural generalisation of the concept of universal coding. We describe a graphical language based on Hidden Markov Models for defining prediction strategies, and we provide both existing and new models as examples. The models include efficient, parameterless models for switching between the input strategies over time, including a model for the case where switches tend to occur in clusters, and finally a new model for the scenario where the prediction strategies have a known relationship, and where jumps are typically between strongly related ones. This last model is relevant for coding time series data where parameter drift is expected. As theoretical ontributions we introduce an interpolation construction that is useful in the development and analysis of new algorithms, and we establish a new sophisticated lemma for analysing the individual sequence regret of parameterised models.

preprint2011arXiv

Adaptive Hedge

Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods.

preprint2011arXiv

Probability-free pricing of adjusted American lookbacks

Consider an American option that pays G(X^*_t) when exercised at time t, where G is a positive increasing function, X^*_t := \sup_{s\le t}X_s, and X_s is the price of the underlying security at time s. Assuming zero interest rates, we show that the seller of this option can hedge his position by trading in the underlying security if he begins with initial capital X_0\int_{X_0}^{\infty}G(x)x^{-2}dx (and this is the smallest initial capital that allows him to hedge his position). This leads to strategies for trading that are always competitive both with a given strategy's current performance and, to a somewhat lesser degree, with its best performance so far. It also leads to methods of statistical testing that avoid sacrificing too much of the maximum statistical significance that they achieve in the course of accumulating data.

preprint2010arXiv

Freezing and Sleeping: Tracking Experts that Learn by Evolving Past Posteriors

A problem posed by Freund is how to efficiently track a small pool of experts out of a much larger set. This problem was solved when Bousquet and Warmuth introduced their mixing past posteriors (MPP) algorithm in 2001. In Freund's problem the experts would normally be considered black boxes. However, in this paper we re-examine Freund's problem in case the experts have internal structure that enables them to learn. In this case the problem has two possible interpretations: should the experts learn from all data or only from the subsequence on which they are being tracked? The MPP algorithm solves the first case. Our contribution is to generalise MPP to address the second option. The results we obtain apply to any expert structure that can be formalised using (expert) hidden Markov models. Curiously enough, for our interpretation there are \emph{two} natural reference schemes: freezing and sleeping. For each scheme, we provide an efficient prediction strategy and prove the relevant loss bound.

preprint2010arXiv

Some mathematical refinements concerning error minimization in the genetic code

The genetic code has been shown to be very error robust compared to randomly selected codes, but to be significantly less error robust than a certain code found by a heuristic algorithm. We formulate this optimisation problem as a Quadratic Assignment Problem and thus verify that the code found by the heuristic is the global optimum. We also argue that it is strongly misleading to compare the genetic code only with codes sampled from the fixed block model, because the real code space is orders of magnitude larger. We thus enlarge the space from which random codes can be sampled from approximately 2.433 x 10^18 codes to approximately 5.908 x 10^45 codes. We do this by leaving the fixed block model, and using the wobble rules to formulate the characteristics acceptable for a genetic code. By relaxing more constraints three larger spaces are also constructed. Using a modified error function, the genetic code is found to be more error robust compared to a background of randomly generated codes with increasing space size. We point out that these results do not necessarily imply that the code was optimized during evolution for error minimization, but that other mechanisms could explain this error robustness.

preprint2010arXiv

Switching between Hidden Markov Models using Fixed Share

In prediction with expert advice the goal is to design online prediction algorithms that achieve small regret (additional loss on the whole data) compared to a reference scheme. In the simplest such scheme one compares to the loss of the best expert in hindsight. A more ambitious goal is to split the data into segments and compare to the best expert on each segment. This is appropriate if the nature of the data changes between segments. The standard fixed-share algorithm is fast and achieves small regret compared to this scheme. Fixed share treats the experts as black boxes: there are no assumptions about how they generate their predictions. But if the experts are learning, the following question arises: should the experts learn from all data or only from data in their own segment? The original algorithm naturally addresses the first case. Here we consider the second option, which is more appropriate exactly when the nature of the data changes between segments. In general extending fixed share to this second case will slow it down by a factor of T on T outcomes. We show, however, that no such slowdown is necessary if the experts are hidden Markov models.

Wouter M. Koolen

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Regret Minimization in Heavy-Tailed Bandits

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Structure Adaptive Algorithms for Stochastic Bandits

Online Isotonic Regression

Robust Probability Updating

Second-order Quantile Methods for Experts and Combinatorial Games

Universal Codes from Switching Strategies

Adaptive Hedge

Probability-free pricing of adjusted American lookbacks

Freezing and Sleeping: Tracking Experts that Learn by Evolving Past Posteriors

Some mathematical refinements concerning error minimization in the genetic code

Switching between Hidden Markov Models using Fixed Share