Source author record

Olivier Catoni

Olivier Catoni appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning math.PR Computation and Language math.OC

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Constant payoff in zero-sum stochastic games

In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a controlled random variable representing the state of nature. The total payoff is the normalized discounted sum of the stage payoffs. In this paper we solve the "constant payoff" conjecture formulated by Sorin, Vigeral and Venel (2010): if both players use optimal strategies, then for any alpha>0, the expected discounted payoff between stage 1 and stage alpha/lambda tends to the limit discounted value of the game, as the discount rate lambda goes to 0.

preprint2021arXiv

New bounds for $k$-means and information $k$-means

In this paper, we derive a new dimension-free non-asymptotic upper bound for the quadratic $k$-means excess risk related to the quantization of an i.i.d sample in a separable Hilbert space. We improve the bound of order $\mathcal{O} \bigl( k / \sqrt{n} \bigr)$ of Biau, Devroye and Lugosi, recovering the rate $\sqrt{k/n}$ that has already been proved by Fefferman, Mitter, and Narayanan and by Klochkov, Kroshnin and Zhivotovskiy but with worse log factors and constants. More precisely, we bound the mean excess risk of an empirical minimizer by the explicit upper bound $16 B^2 \log(n/k) \sqrt{k \log(k) / n}$, in the bounded case when $\mathbb{P}( \lVert X \rVert \leq B) = 1$. This is essentially optimal up to logarithmic factors since a lower bound of order $\mathcal{O} \bigl( \sqrt{k^{1 - 4/d}/n} \bigr)$ is known in dimension $d$. Our technique of proof is based on the linearization of the $k$-means criterion through a kernel trick and on PAC-Bayesian inequalities. To get a $1 / \sqrt{n}$ speed, we introduce a new PAC-Bayesian chaining method replacing the concept of $δ$-net with the perturbation of the parameter by an infinite dimensional Gaussian process. In the meantime, we embed the usual $k$-means criterion into a broader family built upon the Kullback divergence and its underlying properties. This results in a new algorithm that we named information $k$-means, well suited to the clustering of bags of words. Based on considerations from information theory, we also introduce a new bounded $k$-means criterion that uses a scale parameter but satisfies a generalization bound that does not require any boundedness or even integrability conditions on the sample. We describe the counterpart of Lloyd's algorithm and prove generalization bounds for these new $k$-means criteria.

preprint2016arXiv

Markov substitute processes : a new model for linguistics and beyond

We introduce Markov substitute processes, a new model at the crossroad of statistics and formal grammars, and prove its main property : Markov substitute processes with a given support form an exponential family.

preprint2016arXiv

PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design

The topics dicussed in this paper take their origin inthe estimation of the Gram matrix of a random vector from a sample made of n independent copies. They comprise the estimation of the covariance matrix and the study of least squares regression with a random design. We propose four types of results, based on non-asymptotic PAC-Bayesian generalization bounds: a new robust estimator of the Gram matrix and of the covariance matrix, new results on the empirical Gram matrix, new robust least squares estimators and new results on the ordinary least squares estimator, including its exact rate of convergence under polynomial moment assumptions.

preprint2013arXiv

Toric grammars: a new statistical approach to natural language modeling

We propose a new statistical model for computational linguistics. Rather than trying to estimate directly the probability distribution of a random sentence of the language, we define a Markov chain on finite sets of sentences with many finite recurrent communicating classes and define our language model as the invariant probability measures of the chain on each recurrent communicating class. This Markov chain, that we call a communication model, recombines at each step randomly the set of sentences forming its current state, using some grammar rules. When the grammar rules are fixed and known in advance instead of being estimated on the fly, we can prove supplementary mathematical properties. In particular, we can prove in this case that all states are recurrent states, so that the chain defines a partition of its state space into finite recurrent communicating classes. We show that our approach is a decisive departure from Markov models at the sentence level and discuss its relationships with Context Free Grammars. Although the toric grammars we use are closely related to Context Free Grammars, the way we generate the language from the grammar is qualitatively different. Our communication model has two purposes. On the one hand, it is used to define indirectly the probability distribution of a random sentence of the language. On the other hand it can serve as a (crude) model of language transmission from one speaker to another speaker through the communication of a (large) set of sentences.

preprint2012arXiv

Robust linear least squares regression

We consider the problem of robustly predicting as well as the best linear combination of $d$ given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order $d/n$ without logarithmic factor unlike some standard results, where $n$ is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min--max framework and satisfies a $d/n$ risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min--max estimator.

preprint2011arXiv

Challenging the empirical mean and empirical variance: a deviation study

We present new M-estimators of the mean and variance of real valued random variables, based on PAC-Bayes bounds. We analyze the non-asymptotic minimax properties of the deviations of those estimators for sample distributions having either a bounded variance or a bounded variance and a bounded kurtosis. Under those weak hypotheses, allowing for heavy-tailed distributions, we show that the worst case deviations of the empirical mean are suboptimal. We prove indeed that for any confidence level, there is some M-estimator whose deviations are of the same order as the deviations of the empirical mean of a Gaussian statistical sample, even when the statistical sample is instead heavy-tailed. Experiments reveal that these new estimators perform even better than predicted by our bounds, showing deviation quantile functions uniformly lower at all probability levels than the empirical mean for non Gaussian sample distributions as simple as the mixture of two Gaussian measures.

preprint2011arXiv

Linear regression through PAC-Bayesian truncation

We consider the problem of predicting as well as the best linear combination of d given functions in least squares regression under L^\infty constraints on the linear combination. When the input distribution is known, there already exists an algorithm having an expected excess risk of order d/n, where n is the size of the training data. Without this strong assumption, standard results often contain a multiplicative log(n) factor, complex constants involving the conditioning of the Gram matrix of the covariates, kurtosis coefficients or some geometric quantity characterizing the relation between L^2 and L^\infty-balls and require some additional assumptions like exponential moments of the output. This work provides a PAC-Bayesian shrinkage procedure with a simple excess risk bound of order d/n holding in expectation and in deviations, under various assumptions. The common surprising factor of these results is their simplicity and the absence of exponential moment condition on the output distribution while achieving exponential deviations. The risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. We also show that these results can be generalized to other strongly convex loss functions.

preprint2010arXiv

Risk bounds in linear regression through PAC-Bayesian truncation

We consider the problem of predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. When the input distribution is known, there already exists an algorithm having an expected excess risk of order d/n, where n is the size of the training data. Without this strong assumption, standard results often contain a multiplicative log n factor, and require some additional assumptions like uniform boundedness of the d-dimensional input representation and exponential moments of the output. This work provides new risk bounds for the ridge estimator and the ordinary least squares estimator, and their variants. It also provides shrinkage procedures with convergence rate d/n (i.e., without the logarithmic factor) in expectation and in deviations, under various assumptions. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Finally, we show that some of these results are not particular to the least squares loss, but can be generalized to similar strongly convex loss functions.

Olivier Catoni

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Constant payoff in zero-sum stochastic games

New bounds for $k$-means and information $k$-means

Markov substitute processes : a new model for linguistics and beyond

PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design

Toric grammars: a new statistical approach to natural language modeling

Robust linear least squares regression

Challenging the empirical mean and empirical variance: a deviation study

Linear regression through PAC-Bayesian truncation

Risk bounds in linear regression through PAC-Bayesian truncation