Source author record

Johannes Lederer

Johannes Lederer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology math.ST Statistics Theory Applications Artificial Intelligence Computation Neural and Evolutionary Computing Quantitative Methods math.PR Computer Vision cs.CY econ.EM Genomics

Catalog footprint

What is connected

25works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression

Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated separately. In this paper, we propose an approach to entangle these two aspects in the context of regularized estimation. Applying our approach to sparse and group-sparse regression, we show that it can improve on standard pipelines both statistically and computationally.

preprint2022arXiv

Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method

Deep sequencing has become one of the most popular tools for transcriptome profiling in biomedical studies. While an abundance of computational methods exists for "normalizing" sequencing data to remove unwanted between-sample variations due to experimental handling, there is no consensus on which normalization is the most suitable for a given data set. To address this problem, we developed "DANA" - an approach for assessing the performance of normalization methods for microRNA sequencing data based on biology-motivated and data-driven metrics. Our approach takes advantage of well-known biological features of microRNAs for their expression pattern and chromosomal clustering to simultaneously assess (1) how effectively normalization removes handling artifacts, and (2) how aptly normalization preserves biological signals. With DANA, we confirm that the performance of eight commonly used normalization methods vary widely across different data sets and provide guidance for selecting a suitable method for the data at hand. Hence, it should be adopted as a routine preprocessing step (preceding normalization) for microRNA sequencing data analysis. DANA is implemented in R and publicly available at https://github.com/LXQin/DANA.

preprint2022arXiv

Estimating the Lasso's Effective Noise

Much of the theory for the lasso in the linear model $Y = X β^* + \varepsilon$ hinges on the quantity $2 \| X^\top \varepsilon \|_{\infty} / n$, which we call the lasso's effective noise. Among other things, the effective noise plays an important role in finite-sample bounds for the lasso, the calibration of the lasso's tuning parameter, and inference on the parameter vector $β^*$. In this paper, we develop a bootstrap-based estimator of the quantiles of the effective noise. The estimator is fully data-driven, that is, does not require any additional tuning parameters. We equip our estimator with finite-sample guarantees and apply it to tuning parameter calibration for the lasso and to high-dimensional inference on the parameter vector $β^*$.

preprint2022arXiv

Marginal Tail-Adaptive Normalizing Flows

Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. We prove that the marginal tailedness of an autoregressive flow can be controlled via the tailedness of the marginals of its base distribution. This theoretical insight leads us to a novel type of flows based on flexible base distributions and data-driven linear layers. An empirical analysis shows that the proposed method improves on the accuracy -- especially on the tails of the distribution -- and is able to generate heavy-tailed data. We demonstrate its application on a weather and climate example, in which capturing the tail behavior is essential.

preprint2022arXiv

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.

preprint2021arXiv

Activation Functions in Artificial Neural Networks: A Systematic Overview

Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. This paper provides an analytic yet up-to-date overview of popular activation functions and their properties, which makes it a timely resource for anyone who studies or applies neural networks.

preprint2021arXiv

Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data

Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach to control the number of false discoveries. In this paper, we show that knockoffs can be aggregated to increase power while retaining sharp control over the false discoveries. We support our method both in theory and simulations, and we show that it can lead to new discoveries on microbiome data from the American Gut Project. In particular, our results indicate that several phyla that have been overlooked so far are associated with obesity.

preprint2021arXiv

False Discovery Rates in Biological Networks

The increasing availability of data has generated unprecedented prospects for network analyses in many biological fields, such as neuroscience (e.g., brain networks), genomics (e.g., gene-gene interaction networks), and ecology (e.g., species interaction networks). A powerful statistical framework for estimating such networks is Gaussian graphical models, but standard estimators for the corresponding graphs are prone to large numbers of false discoveries. In this paper, we introduce a novel graph estimator based on knockoffs that imitate the partial correlation structures of unconnected nodes. We show that this new estimator guarantees accurate control of the false discovery rate in theory, simulations, and biological applications, and we provide easy-to-use R code.

preprint2021arXiv

Optimization Landscapes of Wide Deep Neural Networks Are Benign

We analyze the optimization landscapes of deep learning with wide networks. We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal parameters that are difficult to escape from. Hence, our theories substantiate the common belief that wide neural networks are not only highly expressive but also comparably easy to optimize.

preprint2020arXiv

A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics

We introduce tools for controlled variable selection to economists. In particular, we apply a recently introduced aggregation scheme for false discovery rate (FDR) control to German administrative data to determine the parts of the individual employment histories that are relevant for the career outcomes of women. Our results suggest that career outcomes can be predicted based on a small set of variables, such as daily earnings, wage increases in combination with a high level of education, employment status, and working experience.

preprint2020arXiv

Is there a role for statistics in artificial intelligence?

The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities.

preprint2020arXiv

Layer Sparsity in Neural Networks

Sparsity has become popular in machine learning, because it can save computational resources, facilitate interpretations, and prevent overfitting. In this paper, we discuss sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks. We call this notion layer sparsity. We then introduce corresponding regularization and refitting schemes that can complement standard deep-learning pipelines to generate more compact and accurate networks.

preprint2020arXiv

Risk Bounds for Robust Deep Learning

It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data. In this paper, we support these empirical findings with statistical theory. We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data. More generally speaking, our paper provides theoretical evidence for the benefits of robust loss functions in deep learning.

preprint2020arXiv

Tuning-free ridge estimators for high-dimensional generalized linear models

Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that can be difficult to calibrate. In this paper, we show that ridge estimators can be modified such that tuning parameters can be avoided altogether. We also show that these modified versions can improve on the empirical prediction accuracies of standard ridge estimators combined with cross-validation, and we provide first theoretical guarantees.

preprint2016arXiv

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of the regularization parameter, we show that our method matches the oracle performance up to a small constant factor, and show that it can be implemented by performing simple tests along a single Lasso path. By applying the Lasso to simulated and real data, we find that our novel scheme can be faster and more accurate than standard schemes such as Cross-Validation.

preprint2016arXiv

On the Prediction Performance of the Lasso

Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. We finally show that our results also lead to near-optimal rates for the least-squares estimator with total variation penalty.

preprint2015arXiv

Compute Less to Get More: Using ORC to Improve Sparse Filtering

Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering with spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests early stopping of Sparse Filtering. We therefore introduce the Optimal Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We show that this stopping criterion is related with pre-processing procedures such as Statistical Whitening and demonstrate that it can make image classification with Sparse Filtering considerably faster and more accurate.

preprint2015arXiv

Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX

Lasso is a seminal contribution to high-dimensional statistics, but it hinges on a tuning parameter that is difficult to calibrate in practice. A partial remedy for this problem is Square-Root Lasso, because it inherently calibrates to the noise variance. However, Square-Root Lasso still requires the calibration of a tuning parameter to all other aspects of the model. In this study, we introduce TREX, an alternative to Lasso with an inherent calibration to all aspects of the model. This adaptation to the entire model renders TREX an estimator that does not require any calibration of tuning parameters. We show that TREX can outperform cross-validated Lasso in terms of variable selection and computational efficiency. We also introduce a bootstrapped version of TREX that can further improve variable selection. We illustrate the promising performance of TREX both on synthetic data and on a recent high-dimensional biological data set that considers riboflavin production in B. subtilis.

preprint2014arXiv

A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression

We introduce a robust and fully adaptive method for pointwise estimation in heteroscedastic regression. We allow for noise and design distributions that are unknown and fulfill very weak assumptions only. In particular, we do not impose moment conditions on the noise distribution. Moreover, we do not require a positive density for the design distribution. In a first step, we study the consistency of locally polynomial M-estimators that consist of a contrast and a kernel. Afterwards, minimax results are established over unidimensional Hölder spaces for degenerate design. We then choose the contrast and the kernel that minimize an empirical variance term and demonstrate that the corresponding M-estimator is adaptive with respect to the noise and design distributions and adaptive (Huber) minimax for contamination models. In a second step, we additionally choose a data-driven bandwidth via Lepski's method. This leads to an M-estimator that is adaptive with respect to the noise and design distributions and, additionally, adaptive with respect to the smoothness of an isotropic, multivariate, locally polynomial target function. These results are also extended to anisotropic, locally constant target functions. Our data-driven approach provides, in particular, a level of robustness that adapts to the noise, contamination, and outliers.

preprint2014arXiv

New concentration inequalities for suprema of empirical processes

While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration.

preprint2014arXiv

Topology Adaptive Graph Estimation in High Dimensions

We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compare GTREX with standard methods on a new simulation set-up that is designed to assess accurately the strengths and shortcomings of different methods. These simulations show that a neighborhood selection scheme based on Lasso and an optimal (in practice unknown) tuning parameter outperforms other standard methods over a large spectrum of scenarios. Moreover, we show that GTREX can rival this scheme and, therefore, can provide competitive graph estimation without the need for tuning parameter calibration.

preprint2013arXiv

The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms

We introduce and study the Group Square-Root Lasso (GSRL) method for estimation in high dimensional sparse regression models with group structure. The new estimator minimizes the square root of the residual sum of squares plus a penalty term proportional to the sum of the Euclidean norms of groups of the regression parameter vector. The net advantage of the method over the existing Group Lasso (GL)-type procedures consists in the form of the proportionality factor used in the penalty term, which for GSRL is independent of the variance of the error terms. This is of crucial importance in models with more parameters than the sample size, when estimating the variance of the noise becomes as difficult as the original problem. We show that the GSRL estimator adapts to the unknown sparsity of the regression vector, and has the same optimal estimation and prediction accuracy as the GL estimators, under the same minimal conditions on the model. This extends the results recently established for the Square-Root Lasso, for sparse regression without group structure. Moreover, as a new type of result for Square-Root Lasso methods, with or without groups, we study correct pattern recovery, and show that it can be achieved under conditions similar to those needed by the Lasso or Group-Lasso-type methods, but with a simplified tuning strategy. We implement our method via a new algorithm, with proved convergence properties, which, unlike existing methods, scales well with the dimension of the problem. Our simulation studies support strongly our theoretical findings.

preprint2013arXiv

Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions

Least-squares refitting is widely used in high dimensional regression to reduce the prediction bias of l1-penalized estimators (e.g., Lasso and Square-Root Lasso). We present theoretical and numerical results that provide new insights into the benefits and pitfalls of least-squares refitting. In particular, we consider both prediction and estimation, and we pay close attention to the effects of correlations in the design matrices of linear regression models, since these correlations - although often neglected - are crucial in the context of linear regression, especially in high dimensional contexts. First, we demonstrate that the benefit of least-squares refitting strongly depends on the setting and task under consideration: least-squares refitting can be beneficial even for settings with highly correlated design matrices but is not advisable for all settings, and least-squares refitting can be beneficial for estimation but performs better for prediction. Finally, we introduce a criterion that indicates whether least-squares refitting is advisable for a specific setting and task under consideration, and we conduct a thorough simulation study involving the Lasso to show the usefulness of this criterion.

preprint2011arXiv

The Bernstein-Orlicz norm and deviation inequalities

We introduce two new concepts designed for the study of empirical processes. First, we introduce a new Orlicz norm which we call the Bernstein-Orlicz norm. This new norm interpolates sub-Gaussian and sub-exponential tail behavior. In particular, we show how this norm can be used to simplify the derivation of deviation inequalities for suprema of collections of random variables. Secondly, we introduce chaining and generic chaining along a tree. These simplify the well-known concepts of chaining and generic chaining. The supremum of the empirical process is then studied as a special case. We show that chaining along a tree can be done using entropy with bracketing. Finally, we establish a deviation inequality for the empirical process for the unbounded case.

preprint2011arXiv

The Lasso, correlated design, and improved oracle inequalities

We study high-dimensional linear models and the $\ell_1$-penalized least squares estimator, also known as the Lasso estimator. In literature, oracle inequalities have been derived under restricted eigenvalue or compatibility conditions. In this paper, we complement this with entropy conditions which allow one to improve the dual norm bound, and demonstrate how this leads to new oracle inequalities. The new oracle inequalities show that a smaller choice for the tuning parameter and a trade-off between $\ell_1$-norms and small compatibility constants are possible. This implies, in particular for correlated design, improved bounds for the prediction error of the Lasso estimator as compared to the methods based on restricted eigenvalue or compatibility conditions only.

Johannes Lederer

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression

Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method

Estimating the Lasso's Effective Noise

Marginal Tail-Adaptive Normalizing Flows

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Activation Functions in Artificial Neural Networks: A Systematic Overview

Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data

False Discovery Rates in Biological Networks

Optimization Landscapes of Wide Deep Neural Networks Are Benign

A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics

Is there a role for statistics in artificial intelligence?

Layer Sparsity in Neural Networks

Risk Bounds for Robust Deep Learning

Tuning-free ridge estimators for high-dimensional generalized linear models

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

On the Prediction Performance of the Lasso

Compute Less to Get More: Using ORC to Improve Sparse Filtering

Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX

A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression

New concentration inequalities for suprema of empirical processes

Topology Adaptive Graph Estimation in High Dimensions

The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms

Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions

The Bernstein-Orlicz norm and deviation inequalities

The Lasso, correlated design, and improved oracle inequalities