Researcher profile

Peter Bühlmann

Peter Bühlmann contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Causal Invariance Learning via Efficient Nonconvex Optimization

Identifying the causal relationship among variables from observational data is an important yet challenging task. This work focuses on identifying the direct causes of an outcome and estimating their magnitude, i.e., learning the causal outcome model. Data from multiple environments provide valuable opportunities to uncover causality by exploiting the invariance principle that the causal outcome model holds across heterogeneous environments. Based on the invariance principle, we propose the Negative Weighted Distributionally Robust Optimization (NegDRO) framework to learn an invariant prediction model. NegDRO minimizes the worst-case combination of risks across multiple environments and enforces invariance by allowing potential negative weights. Under the additive interventions regime, we establish three major contributions: (i) On the statistical side, we provide sufficient and nearly necessary identification conditions under which the invariant prediction model coincides with the causal outcome model; (ii) On the optimization side, despite the nonconvexity of NegDRO, we establish its benign optimization landscape, where all stationary points lie close to the true causal outcome model; (iii) On the computational side, we develop a gradient-based algorithm that provably converges to the causal outcome model, with non-asymptotic convergence rates in both sample size and gradient-descent iterations. In particular, our method avoids exhaustive combinatorial searches over exponentially many subsets of covariates found in the literature, ensuring scalability even when the dimension of the covariates is large. To our knowledge, this is the first causal invariance learning method that finds the approximate global optimality for a nonconvex optimization problem efficiently.

preprint2024arXiv

Robustness Against Weak or Invalid Instruments: Exploring Nonlinear Treatment Models with Machine Learning

We discuss causal inference for observational studies with possibly invalid instrumental variables. We propose a novel methodology called two-stage curvature identification (TSCI) by exploring the nonlinear treatment model with machine learning. {The first-stage machine learning enables improving the instrumental variable's strength and adjusting for different forms of violating the instrumental variable assumptions.} The success of TSCI requires the instrumental variable's effect on treatment to differ from its violation form. A novel bias correction step is implemented to remove bias resulting from the potentially high complexity of machine learning. Our proposed \texttt{TSCI} estimator is shown to be asymptotically unbiased and Gaussian even if the machine learning algorithm does not consistently estimate the treatment model. Furthermore, we design a data-dependent method to choose the best among several candidate violation forms. We apply TSCI to study the effect of education on earnings.

preprint2022arXiv

A Fast Non-parametric Approach for Local Causal Structure Learning

We study the problem of causal structure learning with essentially no assumptions on the functional relationships and noise. We develop DAG-FOCI, a computationally fast algorithm for this setting that is based on the FOCI variable selection algorithm in~\cite{azadkia2021simple}. DAG-FOCI outputs the set of parents of a response variable of interest. We provide theoretical guarantees of our procedure when the underlying graph does not contain any (undirected) cycle containing the response variable of interest. Furthermore, in the absence of this assumption, we give a conservative guarantee against false positive causal claims when the set of parents is identifiable. We demonstrate the applicability of DAG-FOCI on simulated as well as a real dataset from computational biology~\cite{sachs2005causal}.

preprint2022arXiv

Distributional Anchor Regression

Prediction models often fail if train and test data do not stem from the same distribution. Out-of-distribution (OOD) generalization to unseen, perturbed test data is a desirable but difficult-to-achieve property for prediction models and in general requires strong assumptions on the data generating process (DGP). In a causally inspired perspective on OOD generalization, the test data arise from a specific class of interventions on exogenous random variables of the DGP, called anchors. Anchor regression models, introduced by Rothenhaeusler et al. (2021), protect against distributional shifts in the test data by employing causal regularization. However, so far anchor regression has only been used with a squared-error loss which is inapplicable to common responses such as censored continuous or ordinal data. Here, we propose a distributional version of anchor regression which generalizes the method to potentially censored responses with at least an ordered sample space. To this end, we combine a flexible class of parametric transformation models for distributional regression with an appropriate causal regularizer under a more general notion of residuals. In an exemplary application and several simulation scenarios we demonstrate the extent to which OOD generalization is possible.

preprint2022arXiv

Double-estimation-friendly inference for high-dimensional misspecified models

All models may be wrong -- but that is not necessarily a problem for inference. Consider the standard $t$-test for the significance of a variable $X$ for predicting response $Y$ whilst controlling for $p$ other covariates $Z$ in a random design linear model. This yields correct asymptotic type~I error control for the null hypothesis that $X$ is conditionally independent of $Y$ given $Z$ under an \emph{arbitrary} regression model of $Y$ on $(X, Z)$, provided that a linear regression model for $X$ on $Z$ holds. An analogous robustness to misspecification, which we term the "double-estimation-friendly" (DEF) property, also holds for Wald tests in generalised linear models, with some small modifications. In this expository paper we explore this phenomenon, and propose methodology for high-dimensional regression settings that respects the DEF property. We advocate specifying (sparse) generalised linear regression models for both $Y$ and the covariate of interest $X$; our framework gives valid inference for the conditional independence null if either of these hold. In the special case where both specifications are linear, our proposal amounts to a small modification of the popular debiased Lasso test. We also investigate constructing confidence intervals for the regression coefficient of $X$ via inverting our tests; these have coverage guarantees even in partially linear models where the contribution of $Z$ to $Y$ can be arbitrary. Numerical experiments demonstrate the effectiveness of the methodology.

preprint2022arXiv

Structure Learning for Directed Trees

Knowing the causal structure of a system is of fundamental interest in many areas of science and can aid the design of prediction algorithms that work well under manipulations to the system. The causal structure becomes identifiable from the observational distribution under certain restrictions. To learn the structure from data, score-based methods evaluate different graphs according to the quality of their fits. However, for large, continuous, and nonlinear models, these rely on heuristic optimization approaches with no general guarantees of recovering the true causal structure. In this paper, we consider structure learning of directed trees. We propose a fast and scalable method based on Chu-Liu-Edmonds' algorithm we call causal additive trees (CAT). For the case of Gaussian errors, we prove consistency in an asymptotic regime with a vanishing identifiability gap. We also introduce two methods for testing substructure hypotheses with asymptotic family-wise error rate control that is valid post-selection and in unidentified settings. Furthermore, we study the identifiability gap, which quantifies how much better the true causal model fits the observational distribution, and prove that it is lower bounded by local properties of the causal model. Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods.

preprint2022arXiv

The Weighted Generalised Covariance Measure

We introduce a new test for conditional independence which is based on what we call the weighted generalised covariance measure (WGCM). It is an extension of the recently introduced generalised covariance measure (GCM). To test the null hypothesis of X and Y being conditionally independent given Z, our test statistic is a weighted form of the sample covariance between the residuals of nonlinearly regressing X and Y on Z. We propose different variants of the test for both univariate and multivariate X and Y . We give conditions under which the tests yield the correct type I error rate. Finally, we compare our novel tests to the original GCM using simulation and on real data sets. Typically, our tests have power against a wider class of alternatives compared to the GCM. This comes at the cost of having less power against alternatives for which the GCM already works well. In the special case of binary or categorical X and Y , one of our tests has power against all alternatives.

preprint2021arXiv

Graphical Elastic Net and Target Matrices: Fast Algorithms and Software for Sparse Precision Matrix Estimation

We consider estimation of undirected Gaussian graphical models and inverse covariances in high-dimensional scenarios by penalizing the corresponding precision matrix. While single $L_1$ (Graphical Lasso) and $L_2$ (Graphical Ridge) penalties for the precision matrix have already been studied, we propose the combination of both, yielding an Elastic Net type penalty. We enable additional flexibility by allowing to include diagonal target matrices for the precision matrix. We generalize existing algorithms for the Graphical Lasso and provide corresponding software with an efficient implementation to facilitate usage for practitioners. Our software borrows computationally favorable parts from a number of existing packages for the Graphical Lasso, leading to an overall fast(er) implementation and at the same time yielding also much more methodological flexibility.

preprint2021arXiv

Multicarving for high-dimensional post-selection inference

We consider post-selection inference for high-dimensional (generalized) linear models. Data carving (Fithian et al., 2014) is a promising technique to perform this task. However, it suffers from the instability of the model selector and hence, may lead to poor replicability, especially in high-dimensional settings. We propose the multicarve method inspired by multisplitting to improve upon stability and replicability. Furthermore, we extend existing concepts to group inference and illustrate the applicability of the methodology also for generalized linear models.

preprint2021arXiv

Regularizing Double Machine Learning in Partially Linear Endogenous Models

The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a regularization and selection scheme, regsDML, which leads to narrower confidence intervals. It selects either the TSLS DML estimator or a regularization-only estimator depending on whose estimated variance is smaller. The regularization-only estimator is tailored to have a low mean squared error. The regsDML estimator is fully data driven. The regsDML estimator converges at the parametric rate, is asymptotically Gaussian distributed, and asymptotically equivalent to the TSLS DML estimator, but regsDML exhibits substantially better finite sample properties. The regsDML estimator uses the idea of k-class estimators, and we show how DML and k-class estimation can be combined to estimate the linear coefficient in a partially linear endogenous model. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method is available in the R-package dmlalg.

preprint2020arXiv

Anchor regression: heterogeneous data meets causality

We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogeneous variables to solve a relaxation of the causal minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variables assumptions are violated. If anchor regression and least squares provide the same answer (anchor stability), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.

preprint2020arXiv

Deconfounding and Causal Regularization for Stability and External Validity

We review some recent work on removing hidden confounding and causal regularization from a unified viewpoint. We describe how simple and user-friendly techniques improve stability, replicability and distributional robustness in heterogeneous data. In this sense, we provide additional thoughts to the issue on concept drift, raised by Efron (2020), when the data generating distribution is changing.

preprint2020arXiv

Estimating heterogeneous treatment effects in nonstationary time series with state-space models

Randomized trials and observational studies, more often than not, run over a certain period of time. The treatment effect evolves during this period which provides crucial insights into the treatment response and the long-term effects. Many conventional methods for estimating treatment effects are limited to the i.i.d. setting and are not suited for inferring the time dynamics of the treatment effect. The time series encountered in these settings are highly informative but often nonstationary due to the changing effects of treatment. This increases the difficulty, since stationarity, a common assumption in time series analysis, cannot be reasonably assumed. Another challenge is the heterogeneity of the treatment effect when the treatment affects units differently. The task of estimating heterogeneous treatment effects from nonstationary and, in particular, interventional time series is highly relevant but has remained unexplored yet. We propose Causal Transfer, a method which combines regression to adjust for confounding with time series modelling to learn the effect of the treatment and how it evolves over time. Causal Transfer does not assume the data to be stationary and can be applied to randomized trials and observational studies in which treatment is confounded. Causal Transfer adjusts the effect for possible confounders and transfers the learned effect to other time series and, thereby, estimates various forms of treatment effects, such as the average treatment effect (ATE) or the conditional average treatment effect (CATE). By learning the time dynamics of the effect, Causal Transfer can also predict the treatment effect for unobserved future time points and determine the long-term consequences of treatment.

preprint2020arXiv

Spectral Deconfounding via Perturbed Sparse Linear Models

Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $\ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.

preprint2019arXiv

Hierarchical inference for genome-wide association studies: a view on methodology with software

We provide a view on high-dimensional statistical inference for genome-wide association studies (GWAS). It is in part a review but covers also new developments for meta analysis with multiple studies and novel software in terms of an R-package hierinf. Inference and assessment of significance is based on very high-dimensional multivariate (generalized) linear models: in contrast to often used marginal approaches, this provides a step towards more causal-oriented inference.