Source author record

Alain Celisse

Alain Celisse appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Applications Methodology Quantitative Methods

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Analyzing the discrepancy principle for kernelized spectral filter learning algorithms

We investigate the construction of early stopping rules in the nonparametric regression problem where iterative learning algorithms are used and the optimal iteration number is unknown. More precisely, we study the discrepancy principle, as well as modifications based on smoothed residuals, for kernelized spectral filter learning algorithms including gradient descent. Our main theoretical bounds are oracle inequalities established for the empirical estimation error (fixed design), and for the prediction error (random design). From these finite-sample bounds it follows that the classical discrepancy principle is statistically adaptive for slow rates occurring in the hard learning scenario, while the smoothed discrepancy principles are adaptive over ranges of faster rates (resp. higher smoothness parameters). Our approach relies on deviation inequalities for the stopping rules in the fixed design setting, combined with change-of-norm arguments to deal with the random design setting.

preprint2016arXiv

Stability revisited: new generalisation bounds for the Leave-one-Out

The present paper provides a new generic strategy leading to non-asymptotic theoretical guarantees on the Leave-one-Out procedure applied to a broad class of learning algorithms. This strategy relies on two main ingredients: the new notion of $L^q$ stability, and the strong use of moment inequalities. $L^q$ stability extends the ongoing notion of hypothesis stability while remaining weaker than the uniform stability. It leads to new PAC exponential generalisation bounds for Leave-one-Out under mild assumptions. In the literature, such bounds are available only for uniform stable algorithms under boundedness for instance. Our generic strategy is applied to the Ridge regression algorithm as a first step.

preprint2015arXiv

A One-Sample Test for Normality with Kernel Methods

We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence our procedure may be applied either to test data for normality or to test parameters (mean and covariance) if data are assumed Gaussian. Our test is based on the same principle as the MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such as homogeneity or independence testing. Our method makes use of a special kind of parametric bootstrap (typical of goodness-of-fit tests) which is computationally more efficient than standard parametric bootstrap. Moreover, an upper bound for the Type-II error highlights the dependence on influential quantities. Experiments illustrate the practical improvement allowed by our test in high-dimensional settings where common normality tests are known to fail. We also consider an application to covariance rank selection through a sequential procedure.

preprint2014arXiv

MPAgenomics : An R package for multi-patients analysis of genomic markers

MPAgenomics, standing for multi-patients analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker selection from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to facilitate their repeated (sometimes difficult) use, offering an easy-to-use pipeline for beginners in R. The segmentation of successive multiple profiles (finding losses and gains) is based on a new automatic choice of influential parameters since default ones were misleading in the original packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given response.

preprint2014arXiv

New normality test in high dimension with kernel methods

A new goodness-of-fit test for normality in high-dimension (and Reproducing Kernel Hilbert Space) is proposed. It shares common ideas with the Maximum Mean Discrepancy (MMD) it outperforms both in terms of computation time and applicability to a wider range of data. Theoretical results are derived for the Type-I and Type-II errors. They guarantee the control of Type-I error at prescribed level and an exponentially fast decrease of the Type-II error. Synthetic and real data also illustrate the practical improvement allowed by our test compared with other leading approaches in high-dimensional settings.

preprint2014arXiv

Optimal cross-validation in density estimation with the $L^2$-loss

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-$p$-out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size $n$, optimality is achieved for $p$ large enough [with $p/n=o(1)$] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as $p/n$ is conveniently related to the rate of convergence of the best estimator in the collection: (i) $p/n\to1$ as $n\to+\infty$ with a parametric rate, and (ii) $p/n=o(1)$ with some nonparametric estimators. These theoretical results are validated by simulation experiments.

preprint2012arXiv

Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model

The stochastic block model (SBM) is a probabilistic model de- signed to describe heterogeneous directed and undirected graphs. In this paper, we address the asymptotic inference on SBM by use of maximum- likelihood and variational approaches. The identi ability of SBM is proved, while asymptotic properties of maximum-likelihood and variational esti- mators are provided. In particular, the consistency of these estimators is settled, which is, to the best of our knowledge, the rst result of this type for variational estimators with random graphs.

preprint2009arXiv

A survey of cross-validation procedures for model selection

Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

preprint2009arXiv

Segmentation of the mean of heteroscedastic data via cross-validation

This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of change-point detection procedures is proposed, showing that cross-validation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.

Alain Celisse

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Analyzing the discrepancy principle for kernelized spectral filter learning algorithms

Stability revisited: new generalisation bounds for the Leave-one-Out

A One-Sample Test for Normality with Kernel Methods

MPAgenomics : An R package for multi-patients analysis of genomic markers

New normality test in high dimension with kernel methods

Optimal cross-validation in density estimation with the $L^2$-loss

Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model

A survey of cross-validation procedures for model selection

Segmentation of the mean of heteroscedastic data via cross-validation