Source author record

Olivier Teytaud

Olivier Teytaud appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence math.OC Neural and Evolutionary Computing Computer Science and Game Theory Biological Physics Computation and Language Cryptography and Security physics.optics

Catalog footprint

What is connected

18works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying data, raising privacy and security concerns such as susceptibility to data poisoning attacks. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide non-vacuous generalization bounds and strong theoretical guarantees for privacy, robustness to data poisoning attacks, and extraction attacks. In experiments with LLMs, we demonstrate empirically that black-box optimization methods, despite the scalability and computational challenges inherent to black-box approaches, are able to learn, showing how a few iterations of BBoxER improve performance, generalize well on a benchmark of reasoning datasets, and are robust to membership inference attacks. This positions BBoxER as an attractive add-on on top of gradient-based optimization, offering suitability for deployment in restricted or privacy-sensitive environments while also providing non-vacuous generalization guarantees.

preprint2024arXiv

Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.

preprint2022arXiv

Improving Nevergrad's Algorithm Selection Wizard NGOpt through Automated Algorithm Configuration

Algorithm selection wizards are effective and versatile tools that automatically select an optimization algorithm given high-level information about the problem and available computational resources, such as number and type of decision variables, maximal number of evaluations, possibility to parallelize evaluations, etc. State-of-the-art algorithm selection wizards are complex and difficult to improve. We propose in this work the use of automated configuration methods for improving their performance by finding better configurations of the algorithms that compose them. In particular, we use elitist iterated racing (irace) to find CMA configurations for specific artificial benchmarks that replace the hand-crafted CMA configurations currently used in the NGOpt wizard provided by the Nevergrad platform. We discuss in detail the setup of irace for the purpose of generating configurations that work well over the diverse set of problem instances within each benchmark. Our approach improves the performance of the NGOpt wizard, even on benchmark suites that were not part of the tuning by irace.

preprint2021arXiv

Black-Box Optimization Revisited: Improving Algorithm Selection Wizards through Massive Benchmarking

Existing studies in black-box optimization for machine learning suffer from low generalizability, caused by a typically selective choice of problem instances used for training and testing different optimization algorithms. Among other issues, this practice promotes overfitting and poor-performing user guidelines. To address this shortcoming, we propose in this work a benchmark suite, OptimSuite, which covers a broad range of black-box optimization problems, ranging from academic benchmarks to real-world applications, from discrete over numerical to mixed-integer problems, from small to very large-scale problems, from noisy over dynamic to static problems, etc. We demonstrate the advantages of such a broad collection by deriving from it Automated Black Box Optimizer (ABBO), a general-purpose algorithm selection wizard. Using three different types of algorithm selection techniques, ABBO achieves competitive performance on all benchmark suites. It significantly outperforms previous state of the art on some of them, including YABBOB and LSGO. ABBO relies on many high-quality base components. Its excellent performance is obtained without any task-specific parametrization. The OptimSuite benchmark collection, the ABBO wizard and its base solvers have all been merged into the open-source Nevergrad platform, where they are available for reproducible research.

preprint2021arXiv

Deep Learning for General Game Playing with Ludii and Polygames

Combinations of Monte-Carlo tree search and Deep Neural Networks, trained through self-play, have produced state-of-the-art results for automated game-playing in many board games. The training and search algorithms are not game-specific, but every individual game that these approaches are applied to still requires domain knowledge for the implementation of the game's rules, and constructing the neural network's architecture -- in particular the shapes of its input and output tensors. Ludii is a general game system that already contains over 500 different games, which can rapidly grow thanks to its powerful and user-friendly game description language. Polygames is a framework with training and search algorithms, which has already produced superhuman players for several board games. This paper describes the implementation of a bridge between Ludii and Polygames, which enables Polygames to train and evaluate models for games that are implemented and run through Ludii. We do not require any game-specific domain knowledge anymore, and instead leverage our domain knowledge of the Ludii system and its abstract state and move representations to write functions that can automatically determine the appropriate shapes for input and output tensors for any game implemented in Ludii. We describe experimental results for short training runs in a wide variety of different board games, and discuss several open problems and avenues for future research.

preprint2020arXiv

On averaging the best samples in evolutionary computation

Choosing the right selection rate is a long standing issue in evolutionary computation. In the continuous unconstrained case, we prove mathematically that a single parent $μ=1$ leads to a sub-optimal simple regret in the case of the sphere function. We provide a theoretically-based selection rate $μ/λ$ that leads to better progress rates. With our choice of selection rate, we get a provable regret of order $O(λ^{-1})$ which has to be compared with $O(λ^{-2/d})$ in the case where $μ=1$. We complete our study with experiments to confirm our theoretical claims.

preprint2020arXiv

Polygames: Improved Zero Learning

Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by keeping track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19x19, which was often said to be untractable for zero learning; and in Havannah. We also won several first places at the TAAI competitions.

preprint2020arXiv

Population Control meets Doob's Martingale Theorems: the Noise-free Multimodal Case

We study a test-based population size adaptation (TBPSA) method, inspired from population control, in the noise-free multimodal case. In the noisy setting, TBPSA usually recommends, at the end of the run, the center of the Gaussian as an approximation of the optimum. We show that combined with a more naive recommendation, namely recommending the visited point which had the best fitness value so far, TBPSA is also powerful in the noise-free multimodal context. We demonstrate this experimentally and explore this mechanism theoretically: we prove that TBPSA is able to escape plateaus with probability one in spite of the fact that it can converge to local minima. This leads to an algorithm effective in the multimodal setting without resorting to a random restart from scratch.

preprint2020arXiv

Variance Reduction for Better Sampling in Continuous Domains

Design of experiments, random search, initialization of population-based methods, or sampling inside an epoch of an evolutionary algorithm use a sample drawn according to some probability distribution for approximating the location of an optimum. Recent papers have shown that the optimal search distribution, used for the sampling, might be more peaked around the center of the distribution than the prior distribution modelling our uncertainty about the location of the optimum. We confirm this statement, provide explicit values for this reshaping of the search distribution depending on the population size $λ$ and the dimension $d$, and validate our results experimentally.

preprint2020arXiv

Versatile Black-Box Optimization

Choosing automatically the right algorithm using problem descriptors is a classical component of combinatorial optimization. It is also a good tool for making evolutionary algorithms fast, robust and versatile. We present Shiwa, an algorithm good at both discrete and continuous, noisy and noise-free, sequential and parallel, black-box optimization. Our algorithm is experimentally compared to competitors on YABBOB, a BBOB comparable testbed, and on some variants of it, and then validated on several real world testbeds.

preprint2018arXiv

Evolutionary algorithms converge towards evolved biological photonic structures

Nature features a plethora of extraordinary photonic architectures that have been optimized through natural evolution. While numerical optimization is increasingly and successfully used in photonics, it has yet to replicate any of these complex naturally occurring structures. Using evolutionary algorithms directly inspired by natural evolution, we have retrieved emblematic natural photonic structures, indicating how such regular structures might have spontaneously emerged in nature and to which precise optical or fabrication constraints they respond. Comparisons between algorithms show that recombination between individuals inspired by sexual reproduction confers a clear advantage in this context of modular problems and suggest further ways to improve the algorithms. Such an in silico evolution can also suggest original and elegant solutions to practical problems, as illustrated by the design of counter-intuitive anti-reflective coating for solar cells.

preprint2016arXiv

Analysis of Different Types of Regret in Continuous Noisy Optimization

The performance measure of an algorithm is a crucial part of its analysis. The performance can be determined by the study on the convergence rate of the algorithm in question. It is necessary to study some (hopefully convergent) sequence that will measure how "good" is the approximated optimum compared to the real optimum. The concept of Regret is widely used in the bandit literature for assessing the performance of an algorithm. The same concept is also used in the framework of optimization algorithms, sometimes under other names or without a specific name. And the numerical evaluation of convergence rate of noisy algorithms often involves approximations of regrets. We discuss here two types of approximations of Simple Regret used in practice for the evaluation of algorithms for noisy optimization. We use specific algorithms of different nature and the noisy sphere function to show the following results. The approximation of Simple Regret, termed here Approximate Simple Regret, used in some optimization testbeds, fails to estimate the Simple Regret convergence rate. We also discuss a recent new approximation of Simple Regret, that we term Robust Simple Regret, and show its advantages and disadvantages.

preprint2016arXiv

Automatically Reinforcing a Game AI

A recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this is termed portfolio methods. We here investigate the application of such methods to Game Playing Programs (GPPs). In addition, we consider the case in which only one GPP is available - by decomposing this single GPP into several ones through the use of parameters or even simply random seeds. These portfolio methods are trained in a learning phase. We propose two different offline approaches. The simplest one, BestArm, is a straightforward optimization of seeds or parame- ters; it performs quite well against the original GPP, but performs poorly against an opponent which repeats games and learns. The second one, namely Nash-portfolio, performs similarly in a "one game" test, and is much more robust against an opponent who learns. We also propose an online learning portfolio, which tests several of the GPP repeatedly and progressively switches to the best one - using a bandit algorithm.

preprint2016arXiv

Learning opening books in partially observable games: using random seeds in Phantom Go

Many artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom Go, which, as all phantom games, is hard for opening book learning. We improve the winning rate from 50% to 70% in 5x5 against the same AI, and from approximately 0% to 40% in 5x5, 7x7 and 9x9 against a stronger (learning) opponent.

preprint2016arXiv

Noisy Optimization: Fast Convergence Rates with Comparison-Based Algorithms

Derivative Free Optimization is known to be an efficient and robust method to tackle the black-box optimization problem. When it comes to noisy functions, classical comparison-based algorithms are slower than gradient-based algorithms. For quadratic functions, Evolutionary Algorithms without large mutations have a simple regret at best $O(1/ \sqrt{N})$ when $N$ is the number of function evaluations, whereas stochastic gradient descent can reach (tightly) a simple regret in $O(1/N)$. It has been conjectured that gradient approximation by finite differences (hence, not a comparison-based method) is necessary for reaching such a $O(1/N)$. We answer this conjecture in the negative, providing a comparison-based algorithm as good as gradient methods, i.e. reaching $O(1/N)$ - under the condition, however, that the noise is Gaussian. Experimental results confirm the $O(1/N)$ simple regret, i.e., squared rate compared to many published results at $O(1/\sqrt{N})$.

preprint2015arXiv

Algorithm Portfolios for Noisy Optimization

Noisy optimization is the optimization of objective functions corrupted by noise. A portfolio of solvers is a set of solvers equipped with an algorithm selection tool for distributing the computational power among them. Portfolios are widely and successfully used in combinatorial optimization. In this work, we study portfolios of noisy optimization solvers. We obtain mathematically proved performance (in the sense that the portfolio performs nearly as well as the best of its solvers) by an ad hoc portfolio algorithm dedicated to noisy optimization. A somehow surprising result is that it is better to compare solvers with some lag, i.e., propose the current recommendation of best solver based on their performance earlier in the run. An additional finding is a principled method for distributing the computational power among solvers in the portfolio.

preprint2015arXiv

Depth, balancing, and limits of the Elo model

-Much work has been devoted to the computational complexity of games. However, they are not necessarily relevant for estimating the complexity in human terms. Therefore, human-centered measures have been proposed, e.g. the depth. This paper discusses the depth of various games, extends it to a continuous measure. We provide new depth results and present tool (given-first-move, pie rule, size extension) for increasing it. We also use these measures for analyzing games and opening moves in Y, NoGo, Killall Go, and the effect of pie rules.

preprint2014arXiv

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.

Olivier Teytaud

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

Improving Nevergrad's Algorithm Selection Wizard NGOpt through Automated Algorithm Configuration

Black-Box Optimization Revisited: Improving Algorithm Selection Wizards through Massive Benchmarking

Deep Learning for General Game Playing with Ludii and Polygames

On averaging the best samples in evolutionary computation

Polygames: Improved Zero Learning

Population Control meets Doob's Martingale Theorems: the Noise-free Multimodal Case

Variance Reduction for Better Sampling in Continuous Domains

Versatile Black-Box Optimization

Evolutionary algorithms converge towards evolved biological photonic structures

Analysis of Different Types of Regret in Continuous Noisy Optimization

Automatically Reinforcing a Game AI

Learning opening books in partially observable games: using random seeds in Phantom Go

Noisy Optimization: Fast Convergence Rates with Comparison-Based Algorithms

Algorithm Portfolios for Noisy Optimization

Depth, balancing, and limits of the Elo model

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits