Source author record

Adrien Saumard

Adrien Saumard appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Phase transitions for support recovery under local differential privacy

We address the problem of variable selection in a high-dimensional but sparse mean model, under the additional constraint that only privatised data are available for inference. The original data are vectors with independent entries having a symmetric, strongly log-concave distribution on $\mathbb{R}$. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $α-$differential privacy. We provide lower and upper bounds on the rate of convergence for the expected Hamming loss over classes of at most $s$-sparse vectors whose non-zero coordinates are separated from $0$ by a constant $a>0$. As corollaries, we derive necessary and sufficient conditions (up to log factors) for exact recovery and for almost full recovery. When we restrict our attention to non-interactive mechanisms that act independently on each coordinate our lower bound shows that, contrary to the non-private setting, both exact and almost full recovery are impossible whatever the value of $a$ in the high-dimensional regime such that $n α^2/ d^2\lesssim 1$. However, in the regime $nα^2/d^2\gg \log(d)$ we can exhibit a critical value $a^*$ (up to a logarithmic factor) such that exact and almost full recovery are possible for all $a\gg a^*$ and impossible for $a\leq a^*$. We show that these results can be improved when allowing for all non-interactive (that act globally on all coordinates) locally $α-$differentially private mechanisms in the sense that phase transitions occur at lower levels.

preprint2022arXiv

Relaxing the Gaussian assumption in Shrinkage and SURE in high dimension

Shrinkage estimation is a fundamental tool of modern statistics, pioneered by Charles Stein upon his discovery of the famous paradox involving the multivariate Gaussian. A large portion of the subsequent literature only considers the efficiency of shrinkage, and that of an associated procedure known as Stein's Unbiased Risk Estimate, or SURE, in the Gaussian setting of that original work. We investigate what extensions to the domain of validity of shrinkage and SURE can be made away from the Gaussian through the use of tools developed in the probabilistic area now known as Stein's method. We show that shrinkage is efficient away from the Gaussian under very mild conditions on the distribution of the noise. SURE is also proved to be adaptive under similar assumptions, and in particular in a way that retains the classical asymptotics of Pinsker's theorem. Notably, shrinkage and SURE are shown to be efficient under mild distributional assumptions, and particularly for general isotropic log-concave measures.

preprint2020arXiv

K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

We propose a new clustering algorithm that is robust to the presence of outliers in the dataset. We perform Lloyd-type iterations with robust estimates of the centroids. More precisely, we build on the idea of median-of-means statistics to estimate the centroids, but allow for replacement while constructing the blocks. We call this methodology the bootstrap median-of-means (bMOM) and prove that if enough blocks are generated through the bootstrap sampling, then it has a better breakdown point for mean estimation than the classical median-of-means (MOM), where the blocks form a partition of the dataset. From a clustering perspective, bMOM enables to take many blocks of a desired size, thus avoiding possible disappearance of clusters in some blocks, a pitfall that can occur for the partition-based generation of blocks of the classical median-of-means. Experiments on simulated datasets show that the proposed approach, called K-bMOM, performs better than existing robust K-means based methods. Guidelines are provided for tuning the hyper-parameters K-bMOM in practice. It is also recommended to the practitionner to use such a robust approach to initialize their clustering algorithm. Finally, considering a simplified and theoretical version of our estimator, we prove its robustness to adversarial contamination by deriving robust rates of convergence for the K-means distorsion. To our knowledge, it is the first result of this kind for the K-means distorsion.

preprint2016arXiv

On optimality of empirical risk minimization in linear aggregation

In the first part of this paper, we show that the small-ball condition, recently introduced by Mendelson (2015), may behave poorly for important classes of localized functions such as wavelets, piecewise polynomials or trigonometric polynomials, in particular leading to suboptimal estimates of the rate of convergence of ERM for the linear aggregation problem. In a second part, we recover optimal rates of covergence for the excess risk of ERM when the dictionary is made of trigonometric functions. Considering the bounded case, we derive the concentration of the excess risk around a single point, which is an information far more precise than the rate of convergence. In the general setting of a L2 noise, we finally refine the small ball argument by rightly selecting the directions we are looking at, in such a way that we obtain optimal rates of aggregation for the Fourier dictionary.

preprint2015arXiv

Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression

We consider the estimation of a bounded regression function with nonparametric heteroscedastic noise and random design. We study the true and empirical excess risks of the least-squares estimator on finite-dimensional vector spaces. We give upper and lower bounds on these quantities that are nonasymptotic and optimal to first order, allowing the dimension to depend on sample size. These bounds show the equivalence between the true and empirical excess risks when, among other things, the least-squares estimator is consistent in sup-norm with the projection of the regression function onto the considered model. Consistency in the sup-norm is then proved for suitable histogram models and more general models of piecewise polynomials that are endowed with a localized basis structure.

preprint2015arXiv

The Slope Heuristics in Heteroscedastic Regression

We consider the estimation of a regression function with random design and heteroscedastic noise in a nonparametric setting. More precisely, we address the problem of characterizing the optimal penalty when the regression function is estimated by using a penalized least-squares model selection method. In this context, we show the existence of a minimal penalty, defined to be the maximum level of penalization under which the model selection procedure totally misbehaves. The optimal penalty is shown to be twice the minimal one and to satisfy a non-asymptotic pathwise oracle inequality with leading constant almost one. Finally, the ideal penalty being unknown in general, we propose a hold-out penalization procedure and show that the latter is asymptotically optimal.

preprint2014arXiv

Log-concavity and strong log-concavity: a review

We review and formulate results concerning log-concavity and strong-log-concavity in both discrete and continuous settings. We show how preservation of log-concavity and strongly log-concavity on $\mathbb{R}$ under convolution follows from a fundamental monotonicity result of Efron (1969). We provide a new proof of Efron's theorem using the recent asymmetric Brascamp-Lieb inequality due to Otto and Menz (2013). Along the way we review connections between log-concavity and other areas of mathematics and statistics, including concentration of measure, log-Sobolev inequalities, convex geometry, MCMC algorithms, Laplace approximations, and machine learning.

Adrien Saumard

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Phase transitions for support recovery under local differential privacy

Relaxing the Gaussian assumption in Shrinkage and SURE in high dimension

K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

On optimality of empirical risk minimization in linear aggregation

Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression

The Slope Heuristics in Heteroscedastic Regression

Log-concavity and strong log-concavity: a review