Source author record

Guillaume Lecué

Guillaume Lecué appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory math.PR Information Theory math.IT Applications econ.EM

Catalog footprint

What is connected

25works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Statistical Inference in Large Multi-way Networks

We propose a new method to estimate structural parameters in multi-way networks while controlling for rich structures of fixed effects. The method is based on a series of classification tasks and is agnostic to both the number and structure of fixed effects. In contrast to full maximum likelihood approaches, our estimator does not suffer from the incidental parameter problem. For sparsely connected networks, it is also computationally faster than PPML. We provide empirical evidence that our estimator yields more reliable confidence intervals than PPML and its bias-correction strategies. These improvements hold even under model misspecification and are more pronounced in sparse settings. While PPML remains competitive in dense, low-dimensional data, our approach offers a robust alternative for multi-way models that scales efficiently with sparsity. The method is applied to study the causal effect of a policy reform on spatial accessibility to health care in France.

preprint2021arXiv

On the robustness to adversarial corruption and to heavy-tailed data of the Stahel-Donoho median of means

We consider median of means (MOM) versions of the Stahel-Donoho outlyingness (SDO) [stahel 1981, donoho 1982] and of Median Absolute Deviation (MAD) functions to construct subgaussian estimators of a mean vector under adversarial contamination and heavy-tailed data. We develop a single analysis of the MOM version of the SDO which covers all cases ranging from the Gaussian case to the L2 case. It is based on isomorphic and almost isometric properties of the MOM versions of SDO and MAD. This analysis also covers cases where the mean does not even exist but a location parameter does; in those cases we still recover the same subgaussian rates and the same price for adversarial contamination even though there is not even a first moment. These properties are achieved by the classical SDO median and are therefore the first non-asymptotic statistical bounds on the Stahel-Donoho median complementing the $\sqrt{n}$-consistency [maronna 1995] and asymptotic normality [Zuo, Cui, He, 2004] of the Stahel-Donoho estimators. We also show that the MOM version of MAD can be used to construct an estimator of the covariance matrix under only a L2-moment assumption or of a scale parameter if a second moment does not exist.

preprint2021arXiv

Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms

We consider the problem of robust mean and location estimation w.r.t. any pseudo-norm of the form $x\in\mathbb{R}^d\to ||x||_S = \sup_{v\in S}<v,x>$ where $S$ is any symmetric subset of $\mathbb{R}^d$. We show that the deviation-optimal minimax subgaussian rate for confidence $1-δ$ is $$ \max\left(\frac{l^*(Σ^{1/2}S)}{\sqrt{N}}, \sup_{v\in S}||Σ^{1/2}v||_2\sqrt{\frac{\log(1/δ)}{N}}\right) $$where $l^*(Σ^{1/2}S)$ is the Gaussian mean width of $Σ^{1/2}S$ and $Σ$ the covariance of the data (in the benchmark i.i.d. Gaussian case). This improves the entropic minimax lower bound from [Lugosi and Mendelson, 2019] and closes the gap characterized by Sudakov's inequality between the entropy and the Gaussian mean width for this problem. This shows that the right statistical complexity measure for the mean estimation problem is the Gaussian mean width. We also show that this rate can be achieved by a solution to a convex optimization problem in the adversarial and $L_2$ heavy-tailed setup by considering minimum of some Fenchel-Legendre transforms constructed using the Median-of-means principle. We finally show that this rate may also be achieved in situations where there is not even a first moment but a location parameter exists.

preprint2021arXiv

Robust high dimensional learning for Lipschitz and convex losses

We establish risk bounds for Regularized Empirical Risk Minimizers (RERM) when the loss is Lipschitz and convex and the regularization function is a norm. In a first part, we obtain these results in the i.i.d. setup under subgaussian assumptions on the design. In a second part, a more general framework where the design might have heavier tails and data may be corrupted by outliers both in the design and the response variables is considered. In this situation, RERM performs poorly in general. We analyse an alternative procedure based on median-of-means principles and called minmax MOM. We show optimal subgaussian deviation rates for these estimators in the relaxed setting. The main results are meta-theorems allowing a wide-range of applications to various problems in learning theory. To show a non-exhaustive sample of these potential applications, it is applied to classification problems with logistic loss functions regularized by LASSO and SLOPE, to regression problems with Huber loss regularized by Group LASSO and Total Variation. Another advantage of the minmax MOM formulation is that it suggests a systematic way to slightly modify descent based algorithms used in high-dimensional statistics to make them robust to outliers. We illustrate this principle in a Simulations section where a minmax MOM version of classical proximal descent algorithms are turned into robust to outliers algorithms.

preprint2020arXiv

Learning with Semi-Definite Programming: new statistical bounds based on fixed point analysis and excess risk curvature

Many statistical learning problems have recently been shown to be amenable to Semi-Definite Programming (SDP), with community detection and clustering in Gaussian mixture models as the most striking instances [javanmard et al., 2016]. Given the growing range of applications of SDP-based techniques to machine learning problems, and the rapid progress in the design of efficient algorithms for solving SDPs, an intriguing question is to understand how the recent advances from empirical process theory can be put to work in order to provide a precise statistical analysis of SDP estimators. In the present paper, we borrow cutting edge techniques and concepts from the learning theory literature, such as fixed point equations and excess risk curvature arguments, which yield general estimation and prediction results for a wide class of SDP estimators. From this perspective, we revisit some classical results in community detection from [guédon et al.,2016] and [chen et al., 2016], and we obtain statistical guarantees for SDP estimators used in signed clustering, group synchronization and MAXCUT.

preprint2017arXiv

Regularization and the small-ball method I: sparse recovery

We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*} \hat f \in {\rm argmin}_{f\in F}\left(\frac{1}{N}\sum_{i=1}^N\left(Y_i-f(X_i)\right)^2+λΨ(f)\right) \end{equation*} when $Ψ$ is a norm and $F$ is convex. Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particular, it sheds some light on the role various notions of sparsity have in regularization and on their connection with the size of subdifferentials of $Ψ$ in a neighbourhood of the true minimizer. As `proof of concept' we extend the known estimates for the LASSO, SLOPE and trace norm regularization.

preprint2016arXiv

Learning subgaussian classes : Upper and minimax bounds

We obtain sharp oracle inequalities for the empirical risk minimization procedure in the regression model under the assumption that the target Y and the model F are subgaussian. The bound we obtain is sharp in the minimax sense if F is convex. Moreover, under mild assumptions on F, the error rate of ERM remains optimal even if the procedure is allowed to perform with constant probability. A part of our analysis is a new proof of minimax results for the gaussian regression model.

preprint2016arXiv

Performance of empirical risk minimization in linear aggregation

We study conditions under which, given a dictionary $F=\{f_1,\ldots ,f_M\}$ and an i.i.d. sample $(X_i,Y_i)_{i=1}^N$, the empirical minimizer in $\operatorname {span}(F)$ relative to the squared loss, satisfies that with high probability \[R\bigl(\tilde{f}^{\mathrm{ERM}}\bigr)\leq\inf_{f\in\operatorname {span}(F)}R(f)+r_N(M),\] where $R(\cdot)$ is the squared risk and $r_N(M)$ is of the order of $M/N$. Among other results, we prove that a uniform small-ball estimate for functions in $\operatorname {span}(F)$ is enough to achieve that goal when the noise is independent of the design.

preprint2016arXiv

Regularization and the small-ball method II: complexity dependent error rates

For a convex class of functions $F$, a regularization functions $Ψ(\cdot)$ and given the random data $(X_i, Y_i)_{i=1}^N$, we study estimation properties of regularization procedures of the form \begin{equation*} \hat f \in {\rm argmin}_{f\in F}\Big(\frac{1}{N}\sum_{i=1}^N\big(Y_i-f(X_i)\big)^2+λΨ(f)\Big) \end{equation*} for some well chosen regularization parameter $λ$. We obtain bounds on the $L_2$ estimation error rate that depend on the complexity of the "true model" $F^*:=\{f\in F: Ψ(f)\leqΨ(f^*)\}$, where $f^*\in {\rm argmin}_{f\in F}\mathbb{E}(Y-f(X))^2$ and the $(X_i,Y_i)$'s are independent and distributed as $(X,Y)$. Our estimate holds under weak stochastic assumptions -- one of which being a small-ball condition satisfied by $F$ -- and for rather flexible choices of regularization functions $Ψ(\cdot)$. Moreover, the result holds in the learning theory framework: we do not assume any a-priori connection between the output $Y$ and the input $X$. As a proof of concept, we apply our general estimation bound to various choices of $Ψ$, for example, the $\ell_p$ and $S_p$-norms (for $p\geq1$), weak-$\ell_p$, atomic norms, max-norm and SLOPE. In many cases, the estimation rate almost coincides with the minimax rate in the class $F^*$.

preprint2015arXiv

On the gap between RIP-properties and sparse recovery conditions

We consider the problem of recovering sparse vectors from underdetermined linear measurements via $\ell_p$-constrained basis pursuit. Previous analyses of this problem based on generalized restricted isometry properties have suggested that two phenomena occur if $p\neq 2$. First, one may need substantially more than $s \log(en/s)$ measurements (optimal for $p=2$) for uniform recovery of all $s$-sparse vectors. Second, the matrix that achieves recovery with the optimal number of measurements may not be Gaussian (as for $p=2$). We present a new, direct analysis which shows that in fact neither of these phenomena occur. Via a suitable version of the null space property we show that a standard Gaussian matrix provides $\ell_q/\ell_1$-recovery guarantees for $\ell_p$-constrained basis pursuit in the optimal measurement regime. Our result extends to several heavier-tailed measurement matrices. As an application, we show that one can obtain a consistent reconstruction from uniform scalar quantized measurements in the optimal measurement regime.

preprint2015arXiv

Sparse recovery under weak moment assumptions

We prove that iid random vectors that satisfy a rather weak moment assumption can be used as measurement vectors in Compressed Sensing, and the number of measurements required for exact reconstruction is the same as the best possible estimate -- exhibited by a random gaussian matrix. We also prove that this moment condition is necessary, up to a $\log \log $ factor. Applications to the Compatibility Condition and the Restricted Eigenvalue Condition in the noisy setup and to properties of neighbourly random polytopes are also discussed.

preprint2014arXiv

Necessary moment conditions for exact reconstruction via basis pursuit

Let $X=(x_1,...,x_n)$ be a random vector that satisfies a weak small ball property and whose coordinates $x_i$ satisfy that $\|x_i\|_{L_p} \lesssim \sqrt{p} \|x_i\|_{L_2}$ for $p \sim \log n$. In \cite{LM_compressed}, it was shown that $N$ independent copies of $X$ can be used as measurement vectors in Compressed Sensing (using the basis pursuit algorithm) to reconstruct any $d$-sparse vector with the optimal number of measurements $N\gtrsim d \log\big(e n/d\big)$. In this note we show that the result is almost optimal. We construct a random vector $X$ with iid, mean-zero, variance one coordinates that satisfies the same weak small ball property and whose coordinates satisfy that $\|x_i\|_{L_p} \lesssim \sqrt{p} \|x_i\|_{L_2}$ for $p \sim (\log n)/(\log N)$, but the basis pursuit algorithm fails to recover even $1$-sparse vectors. The construction shows that `spiky' measurement vectors may lead to a poor performance by the basis pursuit algorithm, but on the other hand may still perform in an optimal way if one chooses a different reconstruction algorithm (like $\ell_0$-minimization). This exhibits the fact that the convex relaxation of $\ell_0$-minimization comes at a significant cost when using `spiky' measurement vectors.

preprint2014arXiv

Optimal learning with $Q$-aggregation

We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878-1905] for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the $Q$-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.

preprint2013arXiv

Empirical risk minimization is optimal for the convex aggregation problem

Let $F$ be a finite model of cardinality $M$ and denote by $\operatorname {conv}(F)$ its convex hull. The problem of convex aggregation is to construct a procedure having a risk as close as possible to the minimal risk over $\operatorname {conv}(F)$. Consider the bounded regression model with respect to the squared risk denoted by $R(\cdot)$. If ${\widehat{f}}_n^{\mathit{ERM-C}}$ denotes the empirical risk minimization procedure over $\operatorname {conv}(F)$, then we prove that for any $x>0$, with probability greater than $1-4\exp(-x)$, \[R({\widehat{f}}_n^{\mathit{ERM-C}})\leq\min_{f\in \operatorname {conv}(F)}R(f)+c_0\max \biggl(ψ_n^{(C)}(M),\frac{x}{n}\biggr),\] where $c_0>0$ is an absolute constant and $ψ_n^{(C)}(M)$ is the optimal rate of convex aggregation defined in (In Computational Learning Theory and Kernel Machines (COLT-2003) (2003) 303-313 Springer) by $ψ_n^{(C)}(M)=M/n$ when $M\leq \sqrt{n}$ and $ψ_n^{(C)}(M)=\sqrt{\log (\mathrm{e}M/\sqrt{n})/n}$ when $M>\sqrt{n}$.

preprint2013arXiv

Minimax rate of convergence and the performance of ERM in phase recovery

We study the performance of Empirical Risk Minimization in noisy phase retrieval problems, indexed by subsets of $\R^n$ and relative to subgaussian sampling; that is, when the given data is $y_i=\inr{a_i,x_0}^2+w_i$ for a subgaussian random vector $a$, independent noise $w$ and a fixed but unknown $x_0$ that belongs to a given subset of $\R^n$. We show that ERM produces $\hat{x}$ whose Euclidean distance to either $x_0$ or $-x_0$ depends on the gaussian mean-width of the indexing set and on the signal-to-noise ratio of the problem. The bound coincides with the one for linear regression when $\|x_0\|_2$ is of the order of a constant. In addition, we obtain a minimax lower bound for the problem and identify sets for which ERM is a minimax procedure. As examples, we study the class of $d$-sparse vectors in $\R^n$ and the unit ball in $\ell_1^n$.

preprint2013arXiv

On the optimality of the aggregate with exponential weights for low temperatures

Given a finite class of functions F, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75-87) is the aggregate with exponential weights (AEW), defined by \[\tilde{f}^{\mathrm{AEW}}=\sum_{f\in F}\hatθ(f)f,\qquad where \hatθ(f)=\frac{\exp(-({n}/{T})R_n(f))}{\sum_{g\in F}\exp(-({n}/{T})R_n(g))},\] where $T>0$ is called the temperature parameter and $R_n(\cdot)$ is an empirical risk. In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when $T\leq c_1$, where $c_1$ is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of $1/\sqrt{n}$. Third, we prove that if a geometric condition on the dictionary (the so-called "Bernstein condition) is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of "almost minimizers" rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.

preprint2012arXiv

General nonexact oracle inequalities for classes with a subexponential envelope

We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates. We apply these results to show that procedures based on $\ell_1$ and nuclear norms regularization functions satisfy oracle inequalities with a residual term that decreases like $1/n$ for every $L_q$-loss functions ($q\geq2$), while only assuming that the tail behavior of the input and output variables are well behaved. In particular, no RIP type of assumption or "incoherence condition" are needed to obtain fast residual terms in those setups. We also apply these results to the problems of convex aggregation and model selection.

preprint2011arXiv

Sharper lower bounds on the performance of the empirical risk minimization algorithm

We present an argument based on the multidimensional and the uniform central limit theorems, proving that, under some geometrical assumptions between the target function $T$ and the learning class $F$, the excess risk of the empirical risk minimization algorithm is lower bounded by \[\frac{\mathbb{E}\sup_{q\in Q}G_q}{\sqrt{n}}δ,\] where $(G_q)_{q\in Q}$ is a canonical Gaussian process associated with $Q$ (a well chosen subset of $F$) and $δ$ is a parameter governing the oscillations of the empirical excess risk function over a small ball in $F$.

preprint2011arXiv

Weighted algorithms for compressed sensing and matrix completion

This paper is about iteratively reweighted basis-pursuit algorithms for compressed sensing and matrix completion problems. In a first part, we give a theoretical explanation of the fact that reweighted basis pursuit can improve a lot upon basis pursuit for exact recovery in compressed sensing. We exhibit a condition that links the accuracy of the weights to the RIP and incoherency constants, which ensures exact recovery. In a second part, we introduce a new algorithm for matrix completion, based on the idea of iterative reweighting. Since a weighted nuclear "norm" is typically non-convex, it cannot be used easily as an objective function. So, we define a new estimator based on a fixed-point equation. We give empirical evidences of the fact that this new algorithm leads to strong improvements over nuclear norm minimization on simulated and real matrix completion problems.

preprint2010arXiv

Sharp oracle inequalities for the prediction of a high-dimensional matrix

We observe $(X_i,Y_i)_{i=1}^n$ where the $Y_i$'s are real valued outputs and the $X_i$'s are $m\times T$ matrices. We observe a new entry $X$ and we want to predict the output $Y$ associated with it. We focus on the high-dimensional setting, where $m T \gg n$. This includes the matrix completion problem with noise, as well as other problems. We consider linear prediction procedures based on different penalizations, involving a mixture of several norms: the nuclear norm, the Frobenius norm and the $\ell_1$-norm. For these procedures, we prove sharp oracle inequalities, using a statistical learning theory point of view. A surprising fact in our results is that the rates of convergence do not depend on $m$ and $T$ directly. The analysis is conducted without the usually considered incoherency condition on the unknown matrix or restricted isometry condition on the sampling operator. Moreover, our results are the first to give for this problem an analysis of penalization (such nuclear norm penalization) as a regularization algorithm: our oracle inequalities prove that these procedures have a prediction accuracy close to the deterministic oracle one, given that the reguralization parameters are well-chosen.

preprint2010arXiv

The Logarithmic Sobolev Constant of The Lamplighter

We give estimates on the logarithmic Sobolev constant of some finite lamplighter graphs in terms of the spectral gap of the underlying base. Also, we give examples of application.

preprint2006arXiv

Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators

We study the performances of an adaptive procedure based on a convex combination, with data-driven weights, of term-by-term thresholded wavelet estimators. For the bounded regression model, with random uniform design, and the nonparametric density model, we show that the resulting estimator is optimal in the minimax sense over all Besov balls under the $L^2$ risk, without any logarithm factor.

preprint2006arXiv

Classification with Minimax Fast Rates for Classes of Bayes Rules with Sparse Representation

We construct a classifier which attains the rate of convergence $\log n/n$ under sparsity and margin assumptions. An approach close to the one met in approximation theory for the estimation of function is used to obtain this result. The idea is to develop the Bayes rule in a fundamental system of $L^2([0,1]^d)$ made of indicator of dyadic sets and to assume that coefficients, equal to $-1,0 {or} 1$, belong to a kind of $L^1-$ball. This assumption can be seen as a sparsity assumption, in the sense that the proportion of coefficients non equal to zero decreases as "frequency" grows. Finally, rates of convergence are obtained by using an usual trade-off between a bias term and a variance term.

preprint2006arXiv

Lower bounds and aggregation in density estimation

In this paper we prove the optimality of an aggregation procedure. We prove lower bounds for aggregation of model selection type of $M$ density estimators for the Kullback-Leiber divergence (KL), the Hellinger's distance and the $L\_1$-distance. The lower bound, with respect to the KL distance, can be achieved by the on-line type estimate suggested, among others, by Yang (2000). Combining these results, we state that $\log M/n$ is an optimal rate of aggregation in the sense of Tsybakov (2003), where $n$ is the sample size.

preprint2006arXiv

Optimal oracle inequality for aggregation of classifiers under low noise condition

We consider the problem of optimality, in a minimax sense, and adaptivity to the margin and to regularity in binary classification. We prove an oracle inequality, under the margin assumption (low noise condition), satisfied by an aggregation procedure which uses exponential weights. This oracle inequality has an optimal residual: $(\log M/n)^{κ/(2κ-1)}$ where $κ$ is the margin parameter, $M$ the number of classifiers to aggregate and $n$ the number of observations. We use this inequality first to construct minimax classifiers under margin and regularity assumptions and second to aggregate them to obtain a classifier which is adaptive both to the margin and regularity. Moreover, by aggregating plug-in classifiers (only $\log n$), we provide an easily implementable classifier adaptive both to the margin and to regularity.

Guillaume Lecué

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Statistical Inference in Large Multi-way Networks

On the robustness to adversarial corruption and to heavy-tailed data of the Stahel-Donoho median of means

Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms

Robust high dimensional learning for Lipschitz and convex losses

Learning with Semi-Definite Programming: new statistical bounds based on fixed point analysis and excess risk curvature

Regularization and the small-ball method I: sparse recovery

Learning subgaussian classes : Upper and minimax bounds

Performance of empirical risk minimization in linear aggregation

Regularization and the small-ball method II: complexity dependent error rates

On the gap between RIP-properties and sparse recovery conditions

Sparse recovery under weak moment assumptions

Necessary moment conditions for exact reconstruction via basis pursuit

Optimal learning with $Q$-aggregation

Empirical risk minimization is optimal for the convex aggregation problem

Minimax rate of convergence and the performance of ERM in phase recovery

On the optimality of the aggregate with exponential weights for low temperatures

General nonexact oracle inequalities for classes with a subexponential envelope

Sharper lower bounds on the performance of the empirical risk minimization algorithm

Weighted algorithms for compressed sensing and matrix completion

Sharp oracle inequalities for the prediction of a high-dimensional matrix

The Logarithmic Sobolev Constant of The Lamplighter

Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators

Classification with Minimax Fast Rates for Classes of Bayes Rules with Sparse Representation

Lower bounds and aggregation in density estimation

Optimal oracle inequality for aggregation of classifiers under low noise condition