Source author record

Gilles Blanchard

Gilles Blanchard appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory Methodology math.PR Artificial Intelligence Computation Computational Geometry math.CO

Catalog footprint

What is connected

21works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A phase transition in Erdős-Barak random graphs

We study monotone paths in Erdős-Rényi random graphs on numbered vertices. Benjamini & Tzalik established a phase transition at $p = \frac{\log n}{n}$ for this model. We refine the critical value to $p = \frac{\log n - \log \log n }{n}$ and identify the critical window of order $Θ(1/n)$.

preprint2026arXiv

A Topological Sorting Criterion for Random Causal Directed Acyclic Graphs

Random directed acyclic graphs (DAGs) based on imposing an order on Erdős-Rényi and scale free random graphs are widely used for evaluating causal discovery algorithms. We show that in such DAGs, the set of nodes reachable via open paths, termed relatives, increases monotonically along the causal order. We assess the prevalence of this pattern numerically, and demonstrate that it can be exploited for causal order recovery via sorting by the estimated number of relatives. We note that many simulations in the literature feature settings where this yields an excellent proxy for the causal order, and show that a strict increase of relatives along the causal order leads to a singular Markov equivalence class. We propose sampling time-series DAGs as a possible alternative and discuss implications for causal discovery algorithms and their evaluation on synthetic data.

preprint2026arXiv

Statistical learning on measures: an application to persistence diagrams

We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (μ_1, Y_1), \ldots, (μ_N, Y_N)$ where $μ_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in $\{0, 1\}$. Given a set $\mathcal{F}$ of base-classifiers on $\mathcal{X}$, we build corresponding classifiers in the space of measures. We provide upper and lower bounds on the Rademacher complexity of this new class of classifiers that can be expressed simply in terms of corresponding quantities for the class $\mathcal{F}$. If the measures $μ_i$ are uniform over a finite set, this classification task boils down to a multi-instance learning problem. However, our approach allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams. These objects are discrete measures on $\mathbb{R}^2$, where the coordinates of each point correspond to the range of scales at which a topological feature exists. We will present several classifiers on measures and show how they can heuristically and theoretically enable a good classification performance in various settings in the case of persistence diagrams.

preprint2022arXiv

Topologically penalized regression on manifolds

We study a regression problem on a compact manifold M. In order to take advantage of the underlying geometry and topology of the data, the regression task is performed on the basis of the first several eigenfunctions of the Laplace-Beltrami operator of the manifold, that are regularized with topological penalties. The proposed penalties are based on the topology of the sub-level sets of either the eigenfunctions or the estimated function. The overall approach is shown to yield promising and competitive performance on various applications to both synthetic and real data sets. We also provide theoretical guarantees on the regression function estimates, on both its prediction error and its smoothness (in a topological sense). Taken together, these results support the relevance of our approach in the case where the targeted function is ''topologically smooth''.

preprint2021arXiv

Domain Generalization by Marginal Transfer Learning

In the problem of domain generalization (DG), there are labeled training data sets from several related prediction problems, and the goal is to make accurate predictions on future unlabeled data sets that are not known to the learner. This problem arises in several applications where data distributions fluctuate because of environmental, technical, or other sources of variation. We introduce a formal framework for DG, and argue that it can be viewed as a kind of supervised learning problem by augmenting the original feature space with the marginal distribution of feature vectors. While our framework has several connections to conventional analysis of supervised learning algorithms, several unique aspects of DG require new methods of analysis. This work lays the learning theoretic foundations of domain generalization, building on our earlier conference paper where the problem of DG was introduced (Blanchard et al., 2011). We present two formal models of data generation, corresponding notions of risk, and distribution-free generalization error analysis. By focusing our attention on kernel methods, we also provide more quantitative results and a universally consistent algorithm. An efficient implementation is provided for this algorithm, which is experimentally compared to a pooling strategy on one synthetic and three real-world data sets.

preprint2021arXiv

Online Orthogonal Matching Pursuit

Greedy algorithms for feature selection are widely used for recovering sparse high-dimensional vectors in linear models. In classical procedures, the main emphasis was put on the sample complexity, with little or no consideration of the computation resources required. We present a novel online algorithm: Online Orthogonal Matching Pursuit (OOMP) for online support recovery in the random design setting of sparse linear regression. Our procedure selects features sequentially, alternating between allocation of samples only as needed to candidate features, and optimization over the selected set of variables to estimate the regression coefficients. Theoretical guarantees about the output of this algorithm are proven and its computational complexity is analysed.

preprint2020arXiv

Volume Doubling Condition and a Local Poincaré Inequality on Unweighted Random Geometric Graphs

The aim of this paper is to establish two fundamental measure-metric properties of particular random geometric graphs. We consider $\varepsilon$-neighborhood graphs whose vertices are drawn independently and identically distributed from a common distribution defined on a regular submanifold of $\mathbb{R}^K$. We show that a volume doubling condition (VD) and local Poincaré inequality (LPI) hold for the random geometric graph (with high probability, and uniformly over all shortest path distance balls in a certain radius range) under suitable regularity conditions of the underlying submanifold and the sampling distribution.

preprint2016arXiv

Classification with Asymmetric Label Noise: Consistency and Maximal Denoising

In many real-world classification problems, the labels of training examples are randomly corrupted. Most previous theoretical work on classification with label noise assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. In this work, we give conditions that are necessary and sufficient for the true class-conditional distributions to be identifiable. These conditions are weaker than those analyzed previously, and allow for the classes to be nonseparable and the noise levels to be asymmetric and unknown. The conditions essentially state that a majority of the observed labels are correct and that the true class-conditional distributions are "mutually irreducible," a concept we introduce that limits the similarity of the two distributions. For any label noise problem, there is a unique pair of true class-conditional distributions satisfying the proposed conditions, and we argue that this pair corresponds in a certain sense to maximal denoising of the observed distributions. Our results are facilitated by a connection to "mixture proportion estimation," which is the problem of estimating the maximal proportion of one distribution that is present in another. We establish a novel rate of convergence result for mixture proportion estimation, and apply this to obtain consistency of a discrimination rule based on surrogate loss minimization. Experimental results on benchmark data and a nuclear particle classification problem demonstrate the efficacy of our approach.

preprint2016arXiv

Convergence rates of Kernel Conjugate Gradient for random design regression

We prove statistical rates of convergence for kernel-based least squares regression from i.i.d. data using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. Following the setting introduced in earlier related literature, we study so-called "fast convergence rates" depending on the regularity of the target regression function (measured by a source condition in terms of the kernel integral operator) and on the effective dimensionality of the data mapped into the kernel space. We obtain upper bounds, essentially matching known minimax lower bounds, for the $\mathcal{L}^2$ (prediction) norm as well as for the stronger Hilbert norm, if the true regression function belongs to the reproducing kernel Hilbert space. If the latter assumption is not fulfilled, we obtain similar convergence rates for appropriate norms, provided additional unlabeled data are available.

preprint2016arXiv

Kernel regression, minimax rates and effective dimensionality: beyond the regular case

We investigate if kernel regularization methods can achieve minimax convergence rates over a source condition regularity assumption for the target function. These questions have been considered in past literature, but only under specific assumptions about the decay, typically polynomial, of the spectrum of the the kernel mapping covariance operator. In the perspective of distribution-free results, we investigate this issue under much weaker assumption on the eigenvalue decay, allowing for more complex behavior that can reflect different structure of the data at different scales.

preprint2016arXiv

Optimal Rates For Regularization Of Statistical Inverse Learning Problems

We consider a statistical inverse learning problem, where we observe the image of a function $f$ through a linear operator $A$ at i.i.d. random design points $X_i$, superposed with an additive noise. The distribution of the design points is unknown and can be very general. We analyze simultaneously the direct (estimation of $Af$) and the inverse (estimation of $f$) learning problems. In this general framework, we obtain strong and weak minimax optimal rates of convergence (as the number of observations $n$ grows large) for a large class of spectral regularization methods over regularity classes defined through appropriate source conditions. This improves on or completes previous results obtained in related settings. The optimality of the obtained rates is shown not only in the exponent in $n$ but also in the explicit dependency of the constant factor in the variance of the noise and the radius of the source condition set.

preprint2016arXiv

Permutational Rademacher Complexity: a New Complexity Measure for Transductive Learning

Transductive learning considers situations when a learner observes $m$ labelled training points and $u$ unlabelled test points with the final goal of giving correct answers for the test points. This paper introduces a new complexity measure for transductive learning called Permutational Rademacher Complexity (PRC) and studies its properties. A novel symmetrization inequality is proved, which shows that PRC provides a tighter control over expected suprema of empirical processes compared to what happens in the standard i.i.d. setting. A number of comparison results are also provided, which show the relation between PRC and other popular complexity measures used in statistical learning theory, including Rademacher complexity and Transductive Rademacher Complexity (TRC). We argue that PRC is a more suitable complexity measure for transductive learning. Finally, these results are combined with a standard concentration argument to provide novel data-dependent risk bounds for transductive learning.

preprint2015arXiv

Extensions of stability selection using subsamples of observations and covariates

We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and Bühlmann (J R Stat Soc 72:417-473, 2010). We propose to apply a base selection method repeatedly to random observation subsamples and covariate subsets under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and Bühlmann (J R Stat Soc 72:417-473, 2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method.

preprint2014arXiv

Localized Complexities for Transductive Learning

We show two novel concentration inequalities for suprema of empirical processes when sampling without replacement, which both take the variance of the functions into account. While these inequalities may potentially have broad applications in learning theory in general, we exemplify their significance by studying the transductive setting of learning theory. For which we provide the first excess risk bounds based on the localized complexity of the hypothesis class, which can yield fast rates of convergence also in the transductive learning setting. We give a preliminary analysis of the localized complexities for the prominent case of kernel classes.

preprint2014arXiv

Testing over a continuum of null hypotheses with False Discovery Rate control

We consider statistical hypothesis testing simultaneously over a fairly general, possibly uncountably infinite, set of null hypotheses, under the assumption that a suitable single test (and corresponding $p$-value) is known for each individual hypothesis. We extend to this setting the notion of false discovery rate (FDR) as a measure of type I error. Our main result studies specific procedures based on the observation of the $p$-value process. Control of the FDR at a nominal level is ensured either under arbitrary dependence of $p$-values, or under the assumption that the finite dimensional distributions of the $p$-value process have positive correlations of a specific type (weak PRDS). Both cases generalize existing results established in the finite setting. Its interest is demonstrated in several non-parametric examples: testing the mean/signal in a Gaussian white noise model, testing the intensity of a Poisson process and testing the c.d.f. of i.i.d. random variables.

preprint2011arXiv

On least favorable configurations for step-up-down tests

This paper investigates an open issue related to false discovery rate (FDR) control of step-up-down (SUD) multiple testing procedures. It has been established in earlier literature that for this type of procedure, under some broad conditions, and in an asymptotical sense, the FDR is maximum when the signal strength under the alternative is maximum. In other words, so-called "Dirac uniform configurations" are asymptotically {\em least favorable} in this setting. It is known that this property also holds in a non-asymptotical sense (for any finite number of hypotheses), for the two extreme versions of SUD procedures, namely step-up and step-down (with extra conditions for the step-down case). It is therefore very natural to conjecture that this non-asymptotical {\em least favorable configuration} property could more generally be true for all "intermediate" forms of SUD procedures. We prove that this is, somewhat surprisingly, not the case. The argument is based on the exact calculations proposed earlier by Roquain and Villers (2011), that we extend here by generalizing Steck's recursion to the case of two populations. Secondly, we quantify the magnitude of this phenomenon by providing a nonasymptotic upper-bound and explicit vanishing rates as a function of the total number of hypotheses.

preprint2011arXiv

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning

We derive an upper bound on the local Rademacher complexity of $\ell_p$-norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches aimed at analyzed the case $p=1$ only while our analysis covers all cases $1\leq p\leq\infty$, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order $O(n^{-\fracα{1+α}})$, where $α$ is the minimum eigenvalue decay rate of the individual kernels.

preprint2010arXiv

Kernel Partial Least Squares is Universally Consistent

We prove the statistical consistency of kernel Partial Least Squares Regression applied to a bounded regression learning problem on a reproducing kernel Hilbert space. Partial Least Squares stands out of well-known classical approaches as e.g. Ridge Regression or Principal Components Regression, as it is not defined as the solution of a global cost minimization procedure over a fixed model nor is it a linear estimator. Instead, approximate solutions are constructed by projections onto a nested set of data-dependent subspaces. To prove consistency, we exploit the known fact that Partial Least Squares is equivalent to the conjugate gradient algorithm in combination with early stopping. The choice of the stopping rule (number of iterations) is a crucial point. We study two empirical stopping rules. The first one monitors the estimation error in each iteration step of Partial Least Squares, and the second one estimates the empirical complexity in terms of a condition number. Both stopping rules lead to universally consistent estimators provided the kernel is universal.

preprint2010arXiv

Optimal learning rates for Kernel Conjugate Gradient regression

We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the intrinsic dimensionality of the data mapped into the kernel space. Lower bounds on attainable rates depending on these two quantities were established in earlier literature, and we obtain upper bounds for the considered method that match these lower bounds (up to a log factor) if the true regression function belongs to the reproducing kernel Hilbert space. If this assumption is not fulfilled, we obtain similar convergence rates provided additional unlabeled data are available. The order of the learning rates match state-of-the-art results that were recently obtained for least squares support vector machines and for linear regularization operators.

preprint2010arXiv

Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests

We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities.

preprint2007arXiv

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]

Gilles Blanchard

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

A phase transition in Erdős-Barak random graphs

A Topological Sorting Criterion for Random Causal Directed Acyclic Graphs

Statistical learning on measures: an application to persistence diagrams

Topologically penalized regression on manifolds

Domain Generalization by Marginal Transfer Learning

Online Orthogonal Matching Pursuit

Volume Doubling Condition and a Local Poincaré Inequality on Unweighted Random Geometric Graphs

Classification with Asymmetric Label Noise: Consistency and Maximal Denoising

Convergence rates of Kernel Conjugate Gradient for random design regression

Kernel regression, minimax rates and effective dimensionality: beyond the regular case

Optimal Rates For Regularization Of Statistical Inverse Learning Problems

Permutational Rademacher Complexity: a New Complexity Measure for Transductive Learning

Extensions of stability selection using subsamples of observations and covariates

Localized Complexities for Transductive Learning

Testing over a continuum of null hypotheses with False Discovery Rate control

On least favorable configurations for step-up-down tests

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning

Kernel Partial Least Squares is Universally Consistent

Optimal learning rates for Kernel Conjugate Gradient regression

Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii