Source author record

Geoffrey Chinot

Geoffrey Chinot appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

On the robustness of the minimum $\ell_2$ interpolator

We analyse the interpolator with minimal $\ell_2$-norm $\hatβ$ in a general high dimensional linear regression framework where $\mathbb Y=\mathbb Xβ^*+ξ$ where $\mathbb X$ is a random $n\times p$ matrix with independent $\mathcal N(0,Σ)$ rows and without assumption on the noise vector $ξ\in \mathbb R^n$. We prove that, with high probability, the prediction loss of this estimator is bounded from above by $(\|β^*\|^2_2r_{cn}(Σ)\vee \|ξ\|^2)/n$, where $r_{k}(Σ)=\sum_{i\geq k}λ_i(Σ)$ are the rests of the sum of eigenvalues of $Σ$. These bounds show a transition in the rates. For high signal to noise ratios, the rates $\|β^*\|^2_2r_{cn}(Σ)/n$ broadly improve the existing ones. For low signal to noise ratio, we also provide lower bound holding with large probability. Under assumptions on the sprectrum of $Σ$, this lower bound is of order $\| ξ\|_2^2/n$, matching the upper bound. Consequently, in the large noise regime, we are able to precisely track the prediction error with large probability. This results give new insight when the interpolation can be harmless in high dimensions.

preprint2021arXiv

Robust high dimensional learning for Lipschitz and convex losses

We establish risk bounds for Regularized Empirical Risk Minimizers (RERM) when the loss is Lipschitz and convex and the regularization function is a norm. In a first part, we obtain these results in the i.i.d. setup under subgaussian assumptions on the design. In a second part, a more general framework where the design might have heavier tails and data may be corrupted by outliers both in the design and the response variables is considered. In this situation, RERM performs poorly in general. We analyse an alternative procedure based on median-of-means principles and called minmax MOM. We show optimal subgaussian deviation rates for these estimators in the relaxed setting. The main results are meta-theorems allowing a wide-range of applications to various problems in learning theory. To show a non-exhaustive sample of these potential applications, it is applied to classification problems with logistic loss functions regularized by LASSO and SLOPE, to regression problems with Huber loss regularized by Group LASSO and Total Variation. Another advantage of the minmax MOM formulation is that it suggests a systematic way to slightly modify descent based algorithms used in high-dimensional statistics to make them robust to outliers. We illustrate this principle in a Simulations section where a minmax MOM version of classical proximal descent algorithms are turned into robust to outliers algorithms.

preprint2020arXiv

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out. On the other hand, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible. In this work, we demonstrate that the separability assumption using a neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. A remarkable point of our result is that our convergence and generalization bounds have much better dependence on the network width in comparison to related studies. Consequently, our theory provides a generalization guarantee for less over-parameterized two-layer networks, while most studies require much higher over-parameterization.