Source author record

Hannes Leeb

Hannes Leeb appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Conditional predictive inference for stable algorithms

We investigate generically applicable and intuitively appealing prediction intervals based on $k$-fold cross validation. We focus on the conditional coverage probability of the proposed intervals, given the observations in the training sample (hence, training conditional validity), and show that it is close to the nominal level, in an appropriate sense, provided that the underlying algorithm used for computing point predictions is sufficiently stable when feature-response pairs are omitted. Our results are based on a finite sample analysis of the empirical distribution function of $k$-fold cross validation residuals and hold in non-parametric settings with only minimal assumptions on the error distribution. To illustrate our results, we also apply them to high-dimensional linear predictors, where we obtain uniform asymptotic training conditional validity as both sample size and dimension tend to infinity at the same rate and consistent parameter estimation typically fails. These results show that despite the serious problems of resampling procedures for inference on the unknown parameters (cf. Bickel and Freedman, 1983; El Karoui and Purdom, 2018; Mammen, 1996), cross validation methods can be successfully applied to obtain reliable predictive inference even in high dimensions and conditionally on the training data.

preprint2020arXiv

Adaptive, Distribution-Free Prediction Intervals for Deep Networks

The machine learning literature contains several constructions for prediction intervals that are intuitively reasonable but ultimately ad-hoc in that they do not come with provable performance guarantees. We present methods from the statistics literature that can be used efficiently with neural networks under minimal assumptions with guaranteed performance. We propose a neural network that outputs three values instead of a single point estimate and optimizes a loss function motivated by the standard quantile regression loss. We provide two prediction interval methods with finite sample coverage guarantees solely under the assumption that the observations are independent and identically distributed. The first method leverages the conformal inference framework and provides average coverage. The second method provides a new, stronger guarantee by conditioning on the observed data. Lastly, our loss function does not compromise the predictive accuracy of the network like other prediction interval methods. We demonstrate the ease of use of our procedures as well as its improvements over other methods on both simulated and real data. As most deep networks can easily be modified by our method to output predictions with valid prediction intervals, its use should become standard practice, much like reporting standard errors along with mean estimates.

preprint2020arXiv

Conformal prediction intervals for the individual treatment effect

We propose several prediction intervals procedures for the individual treatment effect with either finite-sample or asymptotic coverage guarantee in a non-parametric regression setting, where non-linear regression functions, heteroskedasticity and non-Gaussianity are allowed. The construct the prediction intervals we use the conformal method of Vovk et al. (2005). In extensive simulations, we compare the coverage probability and interval length of our prediction interval procedures. We demonstrate that complex learning algorithms, such as neural networks, can lead to narrower prediction intervals than simple algorithms, such as linear regression, if the sample size is large enough.

preprint2016arXiv

Leave-one-out prediction intervals in linear regression models with many variables

We study prediction intervals based on leave-one-out residuals in a linear regression model where the number of explanatory variables can be large compared to sample size. We establish uniform asymptotic validity (conditional on the training sample) of the proposed interval under minimal assumptions on the unknown error distribution and the high dimensional design. Our intervals are generic in the sense that they are valid for a large class of linear predictors used to obtain a point forecast, such as robust M-estimators, James-Stein type estimators and penalized estimators like the LASSO. These results show that despite the serious problems of resampling procedures for inference on the unknown parameters, leave-one-out methods can be successfully applied to obtain reliable predictive inference even in high dimensions.

preprint2015arXiv

On Various Confidence Intervals Post-Model-Selection

We compare several confidence intervals after model selection in the setting recently studied by Berk et al. [Ann. Statist. 41 (2013) 802-837], where the goal is to cover not the true parameter but a certain nonstandard quantity of interest that depends on the selected model. In particular, we compare the PoSI-intervals that are proposed in that reference with the "naive" confidence interval, which is constructed as if the selected model were correct and fixed a priori (thus ignoring the presence of model selection). Overall, we find that the actual coverage probabilities of all these intervals deviate only moderately from the desired nominal coverage probability. This finding is in stark contrast to several papers in the existing literature, where the goal is to cover the true parameter.

preprint2013arXiv

On the conditional distributions of low-dimensional projections from high-dimensional data

We study the conditional distribution of low-dimensional projections from high-dimensional data, where the conditioning is on other low-dimensional projections. To fix ideas, consider a random d-vector Z that has a Lebesgue density and that is standardized so that $\mathbb{E}Z=0$ and $\mathbb{E}ZZ'=I_d$. Moreover, consider two projections defined by unit-vectors $α$ and $β$, namely a response $y=α'Z$ and an explanatory variable $x=β'Z$. It has long been known that the conditional mean of y given x is approximately linear in x$ under some regularity conditions; cf. Hall and Li [Ann. Statist. 21 (1993) 867-889]. However, a corresponding result for the conditional variance has not been available so far. We here show that the conditional variance of y given x is approximately constant in x (again, under some regularity conditions). These results hold uniformly in $α$ and for most $β$'s, provided only that the dimension of Z is large. In that sense, we see that most linear submodels of a high-dimensional overall model are approximately correct. Our findings provide new insights in a variety of modeling scenarios. We discuss several examples, including sliced inverse regression, sliced average variance estimation, generalized linear models under potential link violation, and sparse linear modeling.

preprint2013arXiv

Shrinkage estimators for prediction out-of-sample: Conditional performance

We find that, in a linear model, the James-Stein estimator, which dominates the maximum-likelihood estimator in terms of its in-sample prediction error, can perform poorly compared to the maximum-likelihood estimator in out-of-sample prediction. We give a detailed analysis of this phenomenon and discuss its implications. When evaluating the predictive performance of estimators, we treat the regressor matrix in the training data as fixed, i.e., we condition on the design variables. Our findings contrast those obtained by Baranchik (1973, Ann. Stat. 1:312-321) and, more recently, by Dicker (2012, arXiv:1102.2952) in an unconditional performance evaluation.

Hannes Leeb

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Conditional predictive inference for stable algorithms

Adaptive, Distribution-Free Prediction Intervals for Deep Networks

Conformal prediction intervals for the individual treatment effect

Leave-one-out prediction intervals in linear regression models with many variables

On Various Confidence Intervals Post-Model-Selection

On the conditional distributions of low-dimensional projections from high-dimensional data

Shrinkage estimators for prediction out-of-sample: Conditional performance