Source author record

Stijn Vansteelandt

Stijn Vansteelandt appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Genomics Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

20works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On estimands in target trial emulation

The target trial framework enables causal inference from longitudinal observational data by emulating randomized trials initiated at multiple time points. Precision is often improved by pooling information across trials, with standard models typically assuming - among other things - a time-constant treatment effect. However, this obscures interpretation when the true treatment effect varies, which we argue to be likely as a result of relying on noncollapsible estimands. To address these challenges, this paper introduces a model-free strategy for target trial analysis, centered around the choice of the estimand, rather than model specification. This ensures that treatment effects remain clearly interpretable for well-defined populations even under model misspecification. We propose estimands suitable for different study designs, and develop accompanying G-computation and inverse probability weighted estimators. Applications on simulations and real data on antimicrobial de-escalation in an intensive care unit setting demonstrate the greater clarity and reliability of the proposed methodology over traditional techniques.

preprint2022arXiv

A statistical test to reject the structural interpretation of a latent factor model

Factor analysis is often used to assess whether a single univariate latent variable is sufficient to explain most of the covariance among a set of indicators for some underlying construct. When evidence suggests that a single factor is adequate, research often proceeds by using a univariate summary of the indicators in subsequent research. Implicit in such practices is the assumption that it is the underlying latent, rather than the indicators, that is causally efficacious. The assumption that the indicators do not have effects on anything subsequent, and that they are themselves only affected by antecedents through the underlying latent is a strong assumption, effectively imposing a structural interpretation on the latent factor model. In this paper, we show that this structural assumption has empirically testable implications, even though the latent variable itself is unobserved. We develop a statistical test to potentially reject the structural interpretation of a latent factor model. We apply this test to data concerning associations between the Satisfaction-with-Life-Scale and subsequent all-cause mortality, which provides strong evidence against a structural interpretation for a univariate latent underlying the scale. Discussion is given to the implications of this result for the development, evaluation, and use of measures and for the use of factor analysis itself.

preprint2022arXiv

IV estimation of causal hazard ratio

Cox's proportional hazards model is one of the most popular statistical models to evaluate associations of exposure with a censored failure time outcome. When confounding factors are not fully observed, the exposure hazard ratio estimated using a Cox model is subject to unmeasured confounding bias. To address this, we propose a novel approach for the identification and estimation of the causal hazard ratio in the presence of unmeasured confounding factors. Our approach is based on a binary instrumental variable, and an additional no-interaction assumption in a first stage regression of the treatment on the IV and unmeasured confounders. We propose, to the best of our knowledge, the first consistent estimator of the (population) causal hazard ratio within an instrumental variable framework. A version of our estimator admits a closed-form representation. We derive the asymptotic distribution of our estimator, and provide a consistent estimator for its asymptotic variance. Our approach is illustrated via simulation studies and a data application.

preprint2022arXiv

On Estimation and Cross-validation of Dynamic Treatment Regimes with Competing Risks

The optimal moment to start renal replacement therapy in a patient with acute kidney injury (AKI) remains a challenging problem in intensive care nephrology. Multiple randomised controlled trials have tried to answer this question, but these can, by definition, only analyse a limited number of treatment initiation strategies. In view of this, we use routinely collected observational data from the Ghent University Hospital intensive care units (ICUs) to investigate different pre-specified timing strategies for renal replacement therapy initiation based on time-updated levels of serum potassium, pH and fluid balance in critically ill patients with AKI with the aim to minimize 30-day ICU mortality. For this purpose, we apply statistical techniques for evaluating the impact of specific dynamic treatment regimes in the presence of ICU discharge as a competing event. We discuss two approaches, a non-parametric one - using an inverse probability weighted Aalen-Johansen estimator - and a semiparametric one - using dynamic-regime marginal structural models. Furthermore, we suggest an easy to implement cross-validation technique that can be used for the out-of-sample performance assessment of the optimal dynamic treatment regime. Our work illustrates the potential of data-driven medical decision support based on routinely collected observational data.

preprint2021arXiv

Demystifying statistical learning based on efficient influence functions

Evaluation of treatment effects and more general estimands is typically achieved via parametric modelling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g. statistical/machine learning) is commonly employed to reduce the risk of misspecification. Naive use of such methods, however, delivers estimators whose bias may shrink too slowly with sample size for inferential methods to perform well, including those based on the bootstrap. Bias arises because standard data-adaptive methods are tuned towards minimal prediction error as opposed to e.g. minimal MSE in the estimator. This may cause excess variability that is difficult to acknowledge, due to the complexity of such strategies. Building on results from non-parametric statistics, targeted learning and debiased machine learning overcome these problems by constructing estimators using the estimand's efficient influence function under the non-parametric model. These increasingly popular methodologies typically assume that the efficient influence function is given, or that the reader is familiar with its derivation. In this paper, we focus on derivation of the efficient influence function and explain how it may be used to construct statistical/machine-learning-based estimators. We discuss the requisite conditions for these estimators to perform well and use diverse examples to convey the broad applicability of the theory.

preprint2021arXiv

Sensitivity Analysis for Unmeasured Confounding via Effect Extrapolation

Inferring the causal effect of a non-randomly assigned exposure on an outcome requires adjusting for common causes of the exposure and outcome to avoid biased conclusions. Notwithstanding the efforts investigators routinely make to measure and adjust for such common causes (or confounders), some confounders typically remain unmeasured, raising the prospect of biased inference in observational studies. Therefore, it is crucial that investigators can practically assess their substantive conclusions' relative (in)sensitivity to potential unmeasured confounding. In this article, we propose a sensitivity analysis strategy that is informed by the stability of the exposure effect over different, well-chosen subsets of the measured confounders. The proposal entails first approximating the process for recording confounders to learn about how the effect is potentially affected by varying amounts of unmeasured confounding, then extrapolating to the effect had hypothetical unmeasured confounders been additionally adjusted for. A large set of measured confounders can thus be exploited to provide insight into the likely presence of unmeasured confounding bias, albeit under an assumption about how data on the confounders are recorded. The proposal's ability to reveal the true effect and ensure valid inference after extrapolation is empirically compared with existing methods using simulation studies. We demonstrate the procedure using two different publicly available datasets commonly used for causal inference.

preprint2020arXiv

A novel estimand to adjust for rescue treatment in clinical trials

The interpretation of randomised clinical trial results is often complicated by intercurrent events. For instance, rescue medication is sometimes given to patients in response to worsening of their disease, either in addition to the randomised treatment or in its place. The use of such medication complicates the interpretation of the intention-to-treat analysis. In view of this, we propose a novel estimand defined as the intention-to-treat effect that would have been observed, had patients on the active arm been switched to rescue medication if and only if they would have been switched when randomised to control. This enables us to disentangle the treatment effect from the effect of rescue medication on a patient's outcome, while avoiding the strong extrapolations that are typically needed when inferring what the intention-to-treat effect would have been in the absence of rescue medication. We develop an inverse probability weighting method to estimate this estimand under specific untestable assumptions, in view of which we propose a sensitivity analysis. We use the method for the analysis of a clinical trial conducted by Janssen Pharmaceuticals, in which chronically ill patients can switch to rescue medication for ethical reasons. Monte Carlo simulations confirm that the proposed estimator is unbiased in moderate sample sizes.

preprint2020arXiv

Assumption-lean inference for generalised linear model parameters

Inference for the parameters indexing generalised linear models is routinely based on the assumption that the model is correct and a priori specified. This is unsatisfactory because the chosen model is usually the result of a data-adaptive model selection process, which may induce excess uncertainty that is not usually acknowledged. Moreover, the assumptions encoded in the chosen model rarely represent some a priori known, ground truth, making standard inferences prone to bias, but also failing to give a pure reflection of the information that is contained in the data. Inspired by developments on assumption-free inference for so-called projection parameters, we here propose novel nonparametric definitions of main effect estimands and effect modification estimands. These reduce to standard main effect and effect modification parameters in generalised linear models when these models are correctly specified, but have the advantage that they continue to capture respectively the primary (conditional) association between two variables, or the degree to which two variables interact (in a statistical sense) in their effect on outcome, even when these models are misspecified. We achieve an assumption-lean inference for these estimands (and thus for the underlying regression parameters) by deriving their influence curve under the nonparametric model and invoking flexible data-adaptive (e.g., machine learning) procedures.

preprint2020arXiv

Confounder selection strategies targeting stable treatment effect estimators

Inferring the causal effect of a treatment on an outcome in an observational study requires adjusting for observed baseline confounders to avoid bias. However, adjusting for all observed baseline covariates, when only a subset are confounders of the effect of interest, is known to yield potentially inefficient and unstable estimators of the treatment effect. Furthermore, it raises the risk of finite-sample bias and bias due to model misspecification. For these stated reasons, confounder (or covariate) selection is commonly used to determine a subset of the available covariates that is sufficient for confounding adjustment. In this article, we propose a confounder selection strategy that focuses on stable estimation of the treatment effect. In particular, when the propensity score model already includes covariates that are sufficient to adjust for confounding, then the addition of covariates that are associated with either treatment or outcome alone, but not both, should not systematically change the effect estimator. The proposal, therefore, entails first prioritizing covariates for inclusion in the propensity score model, then using a change-in-estimate approach to select the smallest adjustment set that yields a stable effect estimate. The ability of the proposal to correctly select confounders, and to ensure valid inference of the treatment effect following data-driven covariate selection, is assessed empirically and compared with existing methods using simulation studies. We demonstrate the procedure using three different publicly available datasets commonly used for causal inference.

preprint2020arXiv

Doubly robust tests of exposure effects under high-dimensional confounding

After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (e.g. the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample-splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University Intensive Care Unit.

preprint2020arXiv

Efficient, Doubly Robust Estimation of the Effect of Dose Switching for Switchers in a Randomised Clinical Trial

Motivated by a clinical trial conducted by Janssen Pharmaceuticals in which a flexible dosing regimen is compared to placebo, we evaluate how switchers in the treatment arm (i.e., patients who were switched to the higher dose) would have fared had they been kept on the low dose. This in order to understand whether flexible dosing is potentially beneficial for them. Simply comparing these patients' responses with those of patients who stayed on the low dose is unsatisfactory because the latter patients are usually in a better health condition. Because the available information in the considered trial is too scarce to enable a reliable adjustment, we will instead transport data from a fixed dosing trial that has been conducted concurrently on the same target, albeit not in an identical patient population. In particular, we will propose an estimator which relies on an outcome model and a propensity score model for the association between study and patient characteristics. The proposed estimator is asymptotically unbiased if at least one of both models is correctly specified, and efficient (under the model defined by the restrictions on the propensity score) when both models are correctly specified. We show that the proposed method for using results from an external study is generically applicable in studies where a classical confounding adjustment is not possible due to positivity violation (e.g., studies where switching takes place in a deterministic manner). Monte Carlo simulations and application to the motivating study demonstrate adequate performance.

preprint2020arXiv

Heterogeneous Indirect Effects for Multiple Mediators using Interventional Effect Models

Decomposing an exposure effect on an outcome into separate natural indirect effects through multiple mediators requires strict assumptions, such as correctly postulating the causal structure of the mediators, and no unmeasured confounding among the mediators. In contrast, interventional indirect effects for multiple mediators can be identified even when - as often - the mediators either have an unknown causal structure, or share unmeasured common causes, or both. Existing estimation methods for interventional indirect effects require calculating each distinct indirect effect in turn. This can quickly become unwieldy or unfeasible, especially when investigating indirect effect measures that may be modified by observed baseline characteristics. In this article, we introduce simplified estimation procedures for such heterogeneous interventional indirect effects using interventional effect models. Interventional effect models are a class of marginal structural models that encode the interventional indirect effects as causal model parameters, thus readily permitting effect modification by baseline covariates using (statistical) interaction terms. The mediators and outcome can be continuous or noncontinuous. We propose two estimation procedures: one using inverse weighting by the counterfactual mediator density or mass functions, and another using Monte Carlo integration. The former has the advantage of not requiring an outcome model, but is susceptible to finite sample biases due to highly variable weights. The latter has the advantage of consistent estimation under a correctly specified (parametric) outcome model, but is susceptible to biases due to extrapolation.

preprint2020arXiv

Longitudinal mediation analysis of time-to-event endpoints in the presence of competing risks

This proposal is motivated by an analysis of the English Longitudinal Study of Ageing (ELSA), which aims to investigate the role of loneliness in explaining the negative impact of hearing loss on dementia. The methodological challenges that complicate this mediation analysis include the use of a time-to-event endpoint subject to competing risks, as well as the presence of feedback relationships between the mediator and confounders that are both repeatedly measured over time. To account for these challenges, we introduce natural effect proportional (cause-specific) hazard models. These extend marginal structural proportional (cause-specific) hazard models to enable effect decomposition. We show that under certain causal assumptions, the path-specific direct and indirect effects indexing this model are identifiable from the observed data. We next propose an inverse probability weighting approach to estimate these effects. On the ELSA data, this approach reveals little evidence that the total efect of hearing loss on dementia is mediated through the feeling of loneliness, with a non-statistically significant indirect effect equal to 1.012 (hazard ratio (HR) scale; 95% confidence interval (CI) 0.986 to 1.053).

preprint2020arXiv

Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown

With multiple potential mediators on the causal pathway from a treatment to an outcome, we consider the problem of decomposing the effects along multiple possible causal path(s) through each distinct mediator. Under Pearl's path-specific effects framework (Pearl, 2001; Avin et al., 2005), such fine-grained decompositions necessitate stringent assumptions, such as correctly specifying the causal structure among the mediators, and there being no unobserved confounding among the mediators. In contrast, interventional direct and indirect effects for multiple mediators (Vansteelandt and Daniel, 2017) can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult when there is a high-dimensional set of possibly continuous and non-continuous mediators. In this article, we avoid the need to model this distribution, by developing a definition of interventional effects previously suggested by VanderWeele and Tchetgen Tchetgen (2017) for longitudinal mediation. We propose a novel estimation strategy that uses non-parametric estimates of the (counterfactual) mediator distributions. Non-continuous outcomes can be accommodated using non-linear outcome models. Estimation proceeds via Monte Carlo integration. The procedure is illustrated using publicly available genomic data (Huang and Pan, 2016) to assess the causal effect of a microRNA expression on the three-month mortality of brain cancer patients that is potentially mediated by expression values of multiple genes.

preprint2020arXiv

Principled Selection of Baseline Covariates to Account for Censoring in Randomized Trials with a Survival Endpoint

The analysis of randomized trials with time-to-event endpoints is nearly always plagued by the problem of censoring. As the censoring mechanism is usually unknown, analyses typically employ the assumption of non-informative censoring. While this assumption usually becomes more plausible as more baseline covariates are being adjusted for, such adjustment also raises concerns. Pre-specification of which covariates will be adjusted for (and how) is difficult, thus prompting the use of data-driven variable selection procedures, which may impede valid inferences to be drawn. The adjustment for covariates moreover adds concerns about model misspecification, and the fact that each change in adjustment set, also changes the censoring assumption and the treatment effect estimand. In this paper, we discuss these concerns and propose a simple variable selection strategy that aims to produce a valid test of the null in large samples. The proposal can be implemented using off-the-shelf software for (penalized) Cox regression, and is empirically found to work well in simulation studies and real data analyses.

preprint2020arXiv

Simulating longitudinal data from marginal structural models using the additive hazard model

Observational longitudinal data on treatments and covariates are increasingly used to investigate treatment effects, but are often subject to time-dependent confounding. Marginal structural models (MSMs), estimated using inverse probability of treatment weighting or the g-formula, are popular for handling this problem. With increasing development of advanced causal inference methods, it is important to be able to assess their performance in different scenarios to guide their application. Simulation studies are a key tool for this, but their use to evaluate causal inference methods has been limited. This paper focuses on the use of simulations for evaluations involving MSMs in studies with a time-to-event outcome. In a simulation, it is important to be able to generate the data in such a way that the correct form of any models to be fitted to those data is known. However, this is not straightforward in the longitudinal setting because it is natural for data to be generated in a sequential conditional manner, whereas MSMs involve fitting marginal rather than conditional hazard models. We provide general results that enable the form of the correctly-specified MSM to be derived based on a conditional data generating procedure, and show how the results can be applied when the conditional hazard model is an Aalen additive hazard or Cox model. Using conditional additive hazard models is advantageous because they imply additive MSMs that can be fitted using standard software. We describe and illustrate a simulation algorithm. Our results will help researchers to effectively evaluate causal inference methods via simulation.

preprint2015arXiv

Robustness and efficiency of covariate adjusted linear instrumental variable estimators

Two-stage least squares (TSLS) estimators and variants thereof are widely used to infer the effect of an exposure on an outcome using instrumental variables (IVs). They belong to a wider class of two-stage IV estimators, which are based on fitting a conditional mean model for the exposure, and then using the fitted exposure values along with the covariates as predictors in a linear model for the outcome. We show that standard TSLS estimators enjoy greater robustness to model misspecification than more general two-stage estimators. However, by potentially using a wrong exposure model, e.g. when the exposure is binary, they tend to be inefficient. In view of this, we study double-robust G-estimators instead. These use working models for the exposure, IV and outcome but only require correct specification of either the IV model or the outcome model to guarantee consistent estimation of the exposure effect. As the finite sample performance of the locally efficient G-estimator can be poor, we further develop G-estimation procedures with improved efficiency and robustness properties under misspecification of some or all working models. Simulation studies and a data analysis demonstrate drastic improvements, with remarkably good performance even when one or more working models are misspecified.

preprint2015arXiv

Structural Nested Models and G-estimation: The Partially Realized Promise

Structural nested models (SNMs) and the associated method of G-estimation were first proposed by James Robins over two decades ago as approaches to modeling and estimating the joint effects of a sequence of treatments or exposures. The models and estimation methods have since been extended to dealing with a broader series of problems, and have considerable advantages over the other methods developed for estimating such joint effects. Despite these advantages, the application of these methods in applied research has been relatively infrequent; we view this as unfortunate. To remedy this, we provide an overview of the models and estimation methods as developed, primarily by Robins, over the years. We provide insight into their advantages over other methods, and consider some possible reasons for failure of the methods to be more broadly adopted, as well as possible remedies. Finally, we consider several extensions of the standard models and estimation methods.

preprint2012arXiv

On Instrumental Variables Estimation of Causal Odds Ratios

Inference for causal effects can benefit from the availability of an instrumental variable (IV) which, by definition, is associated with the given exposure, but not with the outcome of interest other than through a causal exposure effect. Estimation methods for instrumental variables are now well established for continuous outcomes, but much less so for dichotomous outcomes. In this article we review IV estimation of so-called conditional causal odds ratios which express the effect of an arbitrary exposure on a dichotomous outcome conditional on the exposure level, instrumental variable and measured covariates. In addition, we propose IV estimators of so-called marginal causal odds ratios which express the effect of an arbitrary exposure on a dichotomous outcome at the population level, and are therefore of greater public health relevance. We explore interconnections between the different estimators and support the results with extensive simulation studies and three applications.

preprint2011arXiv

Direct genetic effects and their estimation from matched case-control data

In genetic association studies, a single marker is often associated with multiple, correlated phenotypes (e.g., obesity and cardiovascular disease, or nicotine dependence and lung cancer). A pervasive question is then whether that marker has independent effects on all phenotypes. In this article, we address this question by assessing whether there is a direct genetic effect on one phenotype that is not mediated through the other phenotypes. In particular, we investigate how to identify and estimate such direct genetic effects on the basis of (matched) case-control data. We discuss conditions under which such effects are identifiable from the available (matched) case-control data. We find that direct genetic effects are sometimes estimable via standard regression methods, and sometimes via a more general G-estimation method, which has previously been proposed for random samples and unmatched case-control studies (Vansteelandt, 2009) and is here extended to matched case-control studies. The results are used to assess whether the FTO gene is associated with myocardial infarction other than via an effect on obesity.

Stijn Vansteelandt

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

On estimands in target trial emulation

A statistical test to reject the structural interpretation of a latent factor model

IV estimation of causal hazard ratio

On Estimation and Cross-validation of Dynamic Treatment Regimes with Competing Risks

Demystifying statistical learning based on efficient influence functions

Sensitivity Analysis for Unmeasured Confounding via Effect Extrapolation

A novel estimand to adjust for rescue treatment in clinical trials

Assumption-lean inference for generalised linear model parameters

Confounder selection strategies targeting stable treatment effect estimators

Doubly robust tests of exposure effects under high-dimensional confounding

Efficient, Doubly Robust Estimation of the Effect of Dose Switching for Switchers in a Randomised Clinical Trial

Heterogeneous Indirect Effects for Multiple Mediators using Interventional Effect Models

Longitudinal mediation analysis of time-to-event endpoints in the presence of competing risks

Non-linear Mediation Analysis with High-dimensional Mediators whose Causal Structure is Unknown

Principled Selection of Baseline Covariates to Account for Censoring in Randomized Trials with a Survival Endpoint

Simulating longitudinal data from marginal structural models using the additive hazard model

Robustness and efficiency of covariate adjusted linear instrumental variable estimators

Structural Nested Models and G-estimation: The Partially Realized Promise

On Instrumental Variables Estimation of Causal Odds Ratios

Direct genetic effects and their estimation from matched case-control data