Source author record

Yannick Baraud

Yannick Baraud appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory math.PR

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

From robust tests to Bayes-like posterior distributions

In the Bayes paradigm and for a given loss function, we propose the construction of a new type of posterior distributions, that extends the classical Bayes one, for estimating the law of an $n$-sample. The loss functions we have in mind are based on the total variation and Hellinger distances as well as some $\mathbb{L}_{j}$-ones. We prove that, with a probability close to one, this new posterior distribution concentrates its mass in a neighbourhood of the law of the data, for the chosen loss function, provided that this law belongs to the support of the prior or, at least, lies close enough to it. We therefore establish that the new posterior distribution enjoys some robustness properties with respect to a possible misspecification of the prior, or more precisely, its support. For the total variation and squared Hellinger losses, we also show that the posterior distribution keeps its concentration properties when the data are only independent, hence not necessarily i.i.d., provided that most of their marginals or the average of these are close enough to some probability distribution around which the prior puts enough mass. The posterior distribution is therefore also stable with respect to the equidistribution assumption. We illustrate these results by several applications. We consider the problems of estimating a location parameter or both the location and the scale of a density in a nonparametric framework. Finally, we also tackle the problem of estimating a density, with the squared Hellinger loss, in a high-dimensional parametric model under some sparsity conditions. The results established in this paper are non-asymptotic and provide, as much as possible, explicit constants.

preprint2022arXiv

Robust estimation of a regression function in exponential families

We observe $n$ pairs of independent (but not necessarily i.i.d.) random variables $X_{1}=(W_{1},Y_{1}),\ldots,X_{n}=(W_{n},Y_{n})$ and tackle the problem of estimating the conditional distributions $Q_{i}^{\star}(w_{i})$ of $Y_{i}$ given $W_{i}=w_{i}$ for all $i\in\{1,\ldots,n\}$. Even though these might not be true, we base our estimator on the assumptions that the data are i.i.d.\ and the conditional distributions of $Y_{i}$ given $W_{i}=w_{i}$ belong to a one parameter exponential family $\bar{\mathscr{Q}}$ with parameter space given by an interval $I$. More precisely, we pretend that these conditional distributions take the form $Q_{{\boldsymbolθ}(w_{i})}\in \bar{\mathscr{Q}}$ for some ${\boldsymbolθ}$ that belongs to a VC-class $\bar{\boldsymbolΘ}$ of functions with values in $I$. For each $i\in\{1,\ldots,n\}$, we estimate $Q_{i}^{\star}(w_{i})$ by a distribution of the same form, i.e.\ $Q_{\hat{\boldsymbolθ}(w_{i})}\in \bar{\mathscr{Q}}$, where $\hat {\boldsymbolθ}=\hat {\boldsymbolθ}(X_{1},\ldots,X_{n})$ is a well-chosen estimator with values in $\bar{\boldsymbolΘ}$. We show that our estimation strategy is robust to model misspecification, contamination and the presence of outliers. Besides, we provide an algorithm for calculating $\hat{\boldsymbolθ}$ when $\bar{\boldsymbolΘ}$ is a VC-class of functions of low or moderate dimension and we carry out a simulation study to compare the performance of $\hat{\boldsymbolθ}$ to that of the MLE and median-based estimators.

preprint2020arXiv

Robust Bayes-Like Estimation: Rho-Bayes estimation

We consider the problem of estimating the joint distribution $P$ of $n$ independent random variables within the Bayes paradigm from a non-asymptotic point of view. Assuming that $P$ admits some density $s$ with respect to a given reference measure, we consider a density model $\overline S$ for $s$ that we endow with a prior distribution $π$ (with support $\overline S$) and we build a robust alternative to the classical Bayes posterior distribution which possesses similar concentration properties around $s$ whenever it belongs to the model $\overline S$. Furthermore, in density estimation, the Hellinger distance between the classical and the robust posterior distributions tends to 0, as the number of observations tends to infinity, under suitable assumptions on the model and the prior, provided that the model $\overline S$ contains the true density $s$. However, unlike what happens with the classical Bayes posterior distribution, we show that the concentration properties of this new posterior distribution are still preserved in the case of a misspecification of the model, that is when $s$ does not belong to $\overline S$ but is close enough to it with respect to the Hellinger distance.

preprint2015arXiv

Bounding the expectation of the supremum of an empirical process over a (weak) vc-major class

Given a bounded class of functions G and independent random variables X1, . . . , Xn, we provide an upper bound for the expectation of the supremum of the empirical process over elements of G having a small variance. Our bound applies in the cases where G is a VC-subgraph or a VC-major class and it is of smaller order than those one could get by using a universal entropy bound over the whole class G . It also involves explicit constants and does not require the knowledge of the entropy of G

preprint2013arXiv

Estimating composite functions by model selection

We consider the problem of estimating a function $s$ on $[-1,1]^{k}$ for large values of $k$ by looking for some best approximation by composite functions of the form $g\circ u$. Our solution is based on model selection and leads to a very general approach to solve this problem with respect to many different types of functions $g,u$ and statistical frameworks. In particular, we handle the problems of approximating $s$ by additive functions, single and multiple index models, neural networks, mixtures of Gaussian densities (when $s$ is a density) among other examples. We also investigate the situation where $s=g\circ u$ for functions $g$ and $u$ belonging to possibly anisotropic smoothness classes. In this case, our approach leads to a completely adaptive estimator with respect to the regularity of $s$.

preprint2013arXiv

Estimation of the density of a determinantal process

We consider the problem of estimating the density $Π$ of a determinantal process $N$ from the observation of $n$ independent copies of it. We use an aggregation procedure based on robust testing to build our estimator. We establish non-asymptotic risk bounds with respect to the Hellinger loss and deduce, when $n$ goes to infinity, uniform rates of convergence over classes of densities $Π$ of interest.

preprint2011arXiv

Estimator selection in the Gaussian setting

We consider the problem of estimating the mean $f$ of a Gaussian vector $Y$ with independent components of common unknown variance $σ^{2}$. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection $\FF$ of estimators of $f$ based on $Y$ and, with the same data $Y$, aim at selecting an estimator among $\FF$ with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to $Y$ may be unknown. We establish a non-asymptotic risk bound for the selected estimator. As particular cases, our approach allows to handle the problems of aggregation and model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. We also derive oracle-type inequalities when $\FF$ consists of linear estimators. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice.