Source author record

François Bachoc

François Bachoc appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Applications Artificial Intelligence Computation math.NA math.OC Neural and Evolutionary Computing

Catalog footprint

What is connected

24works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Geometry-induced Regularization in Deep ReLU Neural Networks

Neural networks with a large number of parameters often do not overfit, owing to implicit regularization that favors \lq good\rq{} networks. Other related and puzzling phenomena include properties of flat minima, saddle-to-saddle dynamics, and neuron alignment. To investigate these phenomena, we study the local geometry of deep ReLU neural networks. We show that, for a fixed architecture, as the weights vary, the image of a sample $X$ forms a set whose local dimension changes. The parameter space is partitioned into regions where this local dimension remains constant. The local dimension is invariant under the natural symmetries of ReLU networks (i.e., positive rescalings and neuron permutations). We establish then that the network's geometry induces a regularization, with the local dimension serving as a key measure of regularity. Moreover, we relate the local dimension to a new notion of flatness of minima and to saddle-to-saddle dynamics. For shallow networks, we also show that the local dimension is connected to the number of linear regions perceived by $X$, offering insight into the effects of regularization. This is further supported by experiments and linked to neuron alignment. Our analysis offers, for the first time, a simple and unified geometric explanation that applies to all learning contexts for these phenomena, which are usually studied in isolation. Finally, we explore the practical computation of the local dimension and present experiments on the MNIST dataset, which highlight geometry-induced regularization in this setting.

preprint2022arXiv

Explaining Machine Learning Models using Entropic Variable Projection

In this paper, we present a new explainability formalism designed to shed light on how each input variable of a test set impacts the predictions of machine learning models. Hence, we propose a group explainability formalism for trained machine learning decision rules, based on their response to the variability of the input variables distribution. In order to emphasize the impact of each input variable, this formalism uses an information theory framework that quantifies the influence of all input-output observations based on entropic projections. This is thus the first unified and model agnostic formalism enabling data scientists to interpret the dependence between the input variables, their impact on the prediction errors, and their influence on the output predictions. Convergence rates of the entropic projections are provided in the large sample case. Most importantly, we prove that computing an explanation in our framework has a low algorithmic complexity, making it scalable to real-life large datasets. We illustrate our strategy by explaining complex decision rules learned by using XGBoost, Random Forest or Deep Neural Network classifiers on various datasets such as Adult Income, MNIST, CelebA, Boston Housing, Iris, as well as synthetic ones. We finally make clear its differences with the explainability strategies LIME and SHAP, that are based on single observations. Results can be reproduced by using the freely distributed Python toolbox https://gems-ai.aniti.fr/.

preprint2022arXiv

High-dimensional additive Gaussian processes under monotonicity constraints

We introduce an additive Gaussian process framework accounting for monotonicity constraints and scalable to high dimensions. Our contributions are threefold. First, we show that our framework enables to satisfy the constraints everywhere in the input space. We also show that more general componentwise linear inequality constraints can be handled similarly, such as componentwise convexity. Second, we propose the additive MaxMod algorithm for sequential dimension reduction. By sequentially maximizing a squared-norm criterion, MaxMod identifies the active input dimensions and refines the most important ones. This criterion can be computed explicitly at a linear cost. Finally, we provide open-source codes for our full framework. We demonstrate the performance and scalability of the methodology in several synthetic examples with hundreds of dimensions under monotonicity constraints as well as on a real-world flood application.

preprint2022arXiv

Multivariate Gaussian Random Fields over Generalized Product Spaces involving the Hypertorus

The paper deals with multivariate Gaussian random fields defined over generalized product spaces that involve the hypertorus. The assumption of Gaussianity implies the finite dimensional distributions to be completely specified by the covariance functions, being in this case matrix valued mappings. We start by considering the spectral representations that in turn allow for a characterization of such covariance functions. We then provide some methods for the construction of these matrix valued mappings. Finally, we consider strategies to evade radial symmetry (called isotropy in spatial statistics) and provide representation theorems for such a more general case.

preprint2022arXiv

Sequential construction and dimension reduction of Gaussian processes under constraints

Accounting for inequality constraints, such as boundedness, monotonicity or convexity, is challenging when modeling costly-to-evaluate black box functions. In this regard, finite-dimensional Gaussian process (GP) regression models bring a valuable solution, as they guarantee that the inequality constraints are satisfied everywhere. Nevertheless, these models are currently restricted to small dimensional situations (up to dimension 5). Addressing this issue, we introduce the MaxMod algorithm that sequentially inserts one-dimensional knots or adds active variables, thereby performing at the same time dimension reduction and efficient knot allocation. We prove the convergence of this algorithm. In intermediary steps of the proof, we propose the notion of multi-affine extension and study its properties. We also prove the convergence of finite-dimensional GPs, when the knots are not dense in the input space, extending the recent literature. With simulated and real data, we demonstrate that the MaxMod algorithm remains efficient in higher dimension (at least in dimension 20), and needs fewer knots than other constrained GP models from the state-of-the-art, to reach a given approximation error.

preprint2021arXiv

Properties and comparison of some Kriging sub-model aggregation methods

Kriging is a widely employed technique, in particular for computer experiments, in machine learning or in geostatistics. An important challenge for Kriging is the computational burden when the data set is large. This article focuses on a class of methods aiming at decreasing this computational cost, consisting in aggregating Kriging predictors based on smaller data subsets. It proves that aggregation methods that ignore the covariancebetween sub-models can yield an inconsistent final Kriging prediction. In contrast, a theoretical study of the nested Kriging method shows additional attractive properties for it: First, this predictor is consistent, second it can be interpreted as an exact conditional distribution for a modified process and third, the conditional covariances given the observations can be computed efficiently. This article also includes a theoretical and numerical analysis of how the assignment of the observation points to the sub-models can affect the prediction ability of the aggregated model. Finally, the nested Kriging method is extended to measurement errors and to universal Kriging.

preprint2021arXiv

The sample complexity of level set approximation

We study the problem of approximating the level set of an unknown function by sequentially querying its values. We introduce a family of algorithms called Bisect and Approximate through which we reduce the level set approximation problem to a local function approximation problem. We then show how this approach leads to rate-optimal sample complexity guarantees for H{ö}lder functions, and we investigate how such rates improve when additional smoothness or other structural assumptions hold true.

preprint2020arXiv

Asymptotic analysis of maximum likelihood estimation of covariance parameters for Gaussian processes: an introduction with proofs

This article provides an introduction to the asymptotic analysis of covariance parameter estimation for Gaussian processes. Maximum likelihood estimation is considered. The aim of this introduction is to be accessible to a wide audience and to present some existing results and proof techniques from the literature. The increasing-domain and fixed-domain asymptotic settings are considered. Under increasing-domain asymptotics, it is shown that in general all the components of the covariance parameter can be estimated consistently by maximum likelihood and that asymptotic normality holds. In contrast, under fixed-domain asymptotics, only some components of the covariance parameter, constituting the microergodic parameter, can be estimated consistently. Under fixed-domain asymptotics, the special case of the family of isotropic Matérn covariance functions is considered. It is shown that only a combination of the variance and spatial scale parameter is microergodic. A consistency and asymptotic normality proof is sketched for maximum likelihood estimators.

preprint2020arXiv

Asymptotically Equivalent Prediction in Multivariate Geostatistics

Cokriging is the common method of spatial interpolation (best linear unbiased prediction) in multivariate geostatistics. While best linear prediction has been well understood in univariate spatial statistics, the literature for the multivariate case has been elusive so far. The new challenges provided by modern spatial datasets, being typically multivariate, call for a deeper study of cokriging. In particular, we deal with the problem of misspecified cokriging prediction within the framework of fixed domain asymptotics. Specifically, we provide conditions for equivalence of measures associated with multivariate Gaussian random fields, with index set in a compact set of a d-dimensional Euclidean space. Such conditions have been elusive for over about 50 years of spatial statistics. We then focus on the multivariate Matérn and Generalized Wendland classes of matrix valued covariance functions, that have been very popular for having parameters that are crucial to spatial interpolation, and that control the mean square differentiability of the associated Gaussian process. We provide sufficient conditions, for equivalence of Gaussian measures, relying on the covariance parameters of these two classes. This enables to identify the parameters that are crucial to asymptotically equivalent interpolation in multivariate geostatistics. Our findings are then illustrated through simulation studies.

preprint2020arXiv

Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis

In this paper, we aim to estimate block-diagonal covariance matrices for Gaussian data in high dimension and in fixed dimension. We first estimate the block-diagonal structure of the covariance matrix by theoretical and practical estimators which are consistent. We deduce that the suggested estimator of the covariance matrix in high dimension converges with the same rate than if the true decomposition was known. In fixed dimension , we prove that the suggested estimator is asymptotically efficient. Then, we focus on the estimation of sensitivity indices called "Shapley effects", in the high-dimensional Gaussian linear framework. From the estimated covariance matrix, we obtain an estimator of the Shapley effects with a relative error which goes to zero at the parametric rate up to a logarithm factor. Using the block-diagonal structure of the estimated covariance matrix, this estimator is still available for thousands inputs variables, as long as the maximal block is not too large.

preprint2020arXiv

Gaussian linear approximation for the estimation of the Shapley effects

In this paper, we address the estimation of the sensitivity indices called "Shapley eects". These sensitivity indices enable to handle dependent input variables. The Shapley eects are generally dicult to estimate, but they are easily computable in the Gaussian linear framework. The aim of this work is to use the values of the Shapley eects in an approximated Gaussian linear framework as estimators of the true Shapley eects corresponding to a non-linear model. First, we assume that the input variables are Gaussian with small variances. We provide rates of convergence of the estimated Shapley eects to the true Shapley eects. Then, we focus on the case where the inputs are given by an non-Gaussian empirical mean. We prove that, under some mild assumptions, when the number of terms in the empirical mean increases, the dierence between the true Shapley eects and the estimated Shapley eects given by the Gaussian linear approximation converges to 0. Our theoretical results are supported by numerical studies, showing that the Gaussian linear approximation is accurate and enables to decrease the computational time signicantly.

preprint2020arXiv

Gaussian Processes indexed on the symmetric group: prediction and learning

In the framework of the supervised learning of a real function defined on a space X , the so called Kriging method stands on a real Gaussian field defined on X. The Euclidean case is well known and has been widely studied. In this paper, we explore the less classical case where X is the non commutative finite group of permutations. In this setting, we propose and study an harmonic analysis of the covariance operators that enables to consider Gaussian processes models and forecasting issues. Our theory is motivated by statistical ranking problems.

preprint2020arXiv

Rate of convergence for geometric inference based on the empirical Christoffel function

We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parameters as a function of the sample size. We provide a convergence rate analysis of the resulting support estimation procedure. Our analysis establishes that we may obtain finite sample bounds which are comparable to existing rates for different set estimation procedures. Our results rely on concentration inequalities for the empirical Christoffel function and on estimates of the supremum of the Christoffel-Darboux kernel on sets with smooth boundaries, that can be considered of independent interest.

preprint2020arXiv

Semi-parametric estimation of the variogram of a Gaussian process with stationary increments

We consider the semi-parametric estimation of a scale parameter of a one-dimensional Gaussian process with known smoothness. We suggest an estimator based on quadratic variations and on the moment method. We provide asymptotic approximations of the mean and variance of this estimator, together with asymptotic normality results, for a large class of Gaussian processes. We allow for general mean functions and study the aggregation of several estimators based on various variation sequences. In extensive simulation studies, we show that the asymptotic results accurately depict thefinite-sample situations already for small to moderate sample sizes. We also compare various variation sequences and highlight the efficiency of the aggregation procedure.

preprint2020arXiv

Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution

The Shapley effects are global sensitivity indices: they quantify the impact of each input variable on the output variable in a model. In this work, we suggest new estimators of these sensitivity indices. When the input distribution is known, we investigate the already existing estimator and suggest a new one with a lower variance. Then, when the distribution of the inputs is unknown, we extend these estimators. Finally, we provide asymptotic properties of the estimators studied in this article.

preprint2019arXiv

Spatial Blind Source Separation

Recently a blind source separation model was suggested for spatial data together with an estimator based on the simultaneous diagonalisation of two scatter matrices. The asymptotic properties of this estimator are derived here and a new estimator, based on the joint diagonalisation of more than two scatter matrices, is proposed. The asymptotic properties and merits of the novel estimator are verified in simulation studies. A real data example illustrates the method.

preprint2016arXiv

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum likelihood estimators for multivariate spatial processes have not received their deserved attention yet. In this article we consider the classical increasing-domain asymptotic setting restricting the minimum distance between the locations. Then, one of the main components to be studied from a theoretical point of view is the asymptotic positive definiteness of the underlying covariance matrix. Based on very weak assumptions on the matrix covariance function we show that the smallest eigenvalue of the covariance matrix is asymptotically bounded away from zero. Several practical implications are discussed as well.

preprint2015arXiv

Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case

In parametric estimation of covariance function of Gaussian processes, it is often the case that the true covariance function does not belong to the parametric set used for estimation. This situation is called the misspecified case. In this case, it has been shown that, for irregular spatial sampling of observation points, Cross Validation can yield smaller prediction errors than Maximum Likelihood. Motivated by this observation, we provide a general asymptotic analysis of the misspecified case, for independent and uniformly distributed observation points. We prove that the Maximum Likelihood estimator asymptotically minimizes a Kullback-Leibler divergence, within the misspecified parametric set, while Cross Validation asymptotically minimizes the integrated square prediction error. In a Monte Carlo simulation, we show that the covariance parameters estimated by Maximum Likelihood and Cross Validation, and the corresponding Kullback-Leibler divergences and integrated square prediction errors, can be strongly contrasting. On a more technical level, we provide new increasing-domain asymptotic results for independent and uniformly distributed observation points.

preprint2015arXiv

Improvement of code behaviour in a design of experiments by metamodeling

It is now common practice in nuclear engineering to base extensive studies on numerical computer models. These studies require to run computer codes in potentially thousands of numerical configurations and without expert individual controls on the computational and physical aspects of each simulations.In this paper, we compare different statistical metamodeling techniques and show how metamodels can help to improve the global behaviour of codes in these extensive studies. We consider the metamodeling of the Germinal thermalmechanical code by Kriging, kernel regression and neural networks. Kriging provides the most accurate predictions while neural networks yield the fastest metamodel functions. All three metamodels can conveniently detect strong computation failures. It is however significantly more challenging to detect code instabilities, that is groups of computations that are all valid, but numerically inconsistent with one another. For code instability detection, we find that Kriging provides the most useful tools.

preprint2015arXiv

Optimal configurations of lines and a statistical application

Motivated by the construction of confidence intervals in statistics, we study optimal configurations of $2^d-1$ lines in real projective space $RP^{d-1}$. For small $d$, we determine line sets that numerically minimize a wide variety of potential functions among all configurations of $2^d-1$ lines through the origin. Numerical experiments verify that our findings enable to assess efficiently the tightness of a bound arising from the statistical literature.

preprint2014arXiv

Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes

Covariance parameter estimation of Gaussian processes is analyzed in an asymptotic framework. The spatial sampling is a randomly perturbed regular grid and its deviation from the perfect regular grid is controlled by a single scalar regularity parameter. Consistency and asymptotic normality are proved for the Maximum Likelihood and Cross Validation estimators of the covariance parameters. The asymptotic covariance matrices of the covariance parameter estimators are deterministic functions of the regularity parameter. By means of an exhaustive study of the asymptotic covariance matrices, it is shown that the estimation is improved when the regular grid is strongly perturbed. Hence, an asymptotic confirmation is given to the commonly admitted fact that using groups of observation points with small spacing is beneficial to covariance function estimation. Finally, the prediction error, using a consistent estimator of the covariance parameters, is analyzed in details.

preprint2014arXiv

Hastings-Metropolis algorithm on Markov chains for small-probability estimation

Shielding studies in neutron transport, with Monte Carlo codes, yield challenging problems of small-probability estimation. The particularity of these studies is that the small probability to estimate is formulated in terms of the distribution of a Markov chain, instead of that of a random vector in more classical cases. Thus, it is not straightforward to adapt classical statistical methods, for estimating small probabilities involving random vectors, to these neutron-transport problems. A recent interacting-particle method for small-probability estimation, relying on the Hastings-Metropolis algorithm, is presented. It is shown how to adapt the Hastings-Metropolis algorithm when dealing with Markov chains. A convergence result is also shown. Then, the practical implementation of the resulting method for small-probability estimation is treated in details, for a Monte Carlo shielding study. Finally, it is shown, for this study, that the proposed interacting-particle method considerably outperforms a simple-Monte Carlo method, when the probability to estimate is small.

preprint2013arXiv

Calibration and improved prediction of computer models by universal Kriging

This paper addresses the use of experimental data for calibrating a computer model and improving its predictions of the underlying physical system. A global statistical approach is proposed in which the bias between the computer model and the physical system is modeled as a realization of a Gaussian process. The application of classical statistical inference to this statistical model yields a rigorous method for calibrating the computer model and for adding to its predictions a statistical correction based on experimental data. This statistical correction can substantially improve the calibrated computer model for predicting the physical system on new experimental conditions. Furthermore, a quantification of the uncertainty of this prediction is provided. Physical expertise on the calibration parameters can also be taken into account in a Bayesian framework. Finally, the method is applied to the thermal-hydraulic code FLICA 4, in a single phase friction model framework. It allows to improve the predictions of the thermal-hydraulic code FLICA 4 significantly.

preprint2013arXiv

Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification

The Maximum Likelihood (ML) and Cross Validation (CV) methods for estimating covariance hyper-parameters are compared, in the context of Kriging with a misspecified covariance structure. A two-step approach is used. First, the case of the estimation of a single variance hyper-parameter is addressed, for which the fixed correlation function is misspecified. A predictive variance based quality criterion is introduced and a closed-form expression of this criterion is derived. It is shown that when the correlation function is misspecified, the CV does better compared to ML, while ML is optimal when the model is well-specified. In the second step, the results of the first step are extended to the case when the hyper-parameters of the correlation function are also estimated from data.

François Bachoc

What is connected

Connect this record

See the researcher in context

Building this map preview

24 published item(s)

Geometry-induced Regularization in Deep ReLU Neural Networks

Explaining Machine Learning Models using Entropic Variable Projection

High-dimensional additive Gaussian processes under monotonicity constraints

Multivariate Gaussian Random Fields over Generalized Product Spaces involving the Hypertorus

Sequential construction and dimension reduction of Gaussian processes under constraints

Properties and comparison of some Kriging sub-model aggregation methods

The sample complexity of level set approximation

Asymptotic analysis of maximum likelihood estimation of covariance parameters for Gaussian processes: an introduction with proofs

Asymptotically Equivalent Prediction in Multivariate Geostatistics

Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis

Gaussian linear approximation for the estimation of the Shapley effects

Gaussian Processes indexed on the symmetric group: prediction and learning

Rate of convergence for geometric inference based on the empirical Christoffel function

Semi-parametric estimation of the variogram of a Gaussian process with stationary increments

Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution

Spatial Blind Source Separation

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case

Improvement of code behaviour in a design of experiments by metamodeling

Optimal configurations of lines and a statistical application

Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes

Hastings-Metropolis algorithm on Markov chains for small-probability estimation

Calibration and improved prediction of computer models by universal Kriging

Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification