Source author record

Oliver Dukes

Oliver Dukes appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On estimands in target trial emulation

The target trial framework enables causal inference from longitudinal observational data by emulating randomized trials initiated at multiple time points. Precision is often improved by pooling information across trials, with standard models typically assuming - among other things - a time-constant treatment effect. However, this obscures interpretation when the true treatment effect varies, which we argue to be likely as a result of relying on noncollapsible estimands. To address these challenges, this paper introduces a model-free strategy for target trial analysis, centered around the choice of the estimand, rather than model specification. This ensures that treatment effects remain clearly interpretable for well-defined populations even under model misspecification. We propose estimands suitable for different study designs, and develop accompanying G-computation and inverse probability weighted estimators. Applications on simulations and real data on antimicrobial de-escalation in an intensive care unit setting demonstrate the greater clarity and reliability of the proposed methodology over traditional techniques.

preprint2023arXiv

Generalizing the intention-to-treat effect of an active control against placebo from historical placebo-controlled trials to an active-controlled trial: A case study of the efficacy of daily oral TDF/FTC in the HPTN 084 study

In many clinical settings, an active-controlled trial design (e.g., a non-inferiority or superiority design) is often used to compare an experimental medicine to an active control (e.g., an FDA-approved, standard therapy). One prominent example is a recent phase 3 efficacy trial, HIV Prevention Trials Network Study 084 (HPTN 084), comparing long-acting cabotegravir, a new HIV pre-exposure prophylaxis (PrEP) agent, to the FDA-approved daily oral tenofovir disoproxil fumarate plus emtricitabine (TDF/FTC) in a population of heterosexual women in 7 African countries. One key complication of interpreting study results in an active-controlled trial like HPTN 084 is that the placebo arm is not present and the efficacy of the active control (and hence the experimental drug) compared to the placebo can only be inferred by leveraging other data sources. \bz{In this article, we study statistical inference for the intention-to-treat (ITT) effect of the active control using relevant historical placebo-controlled trials data under the potential outcomes (PO) framework}. We highlight the role of adherence and unmeasured confounding, discuss in detail identification assumptions and two modes of inference (point versus partial identification), propose estimators under identification assumptions permitting point identification, and lay out sensitivity analyses needed to relax identification assumptions. We applied our framework to estimating the intention-to-treat effect of daily oral TDF/FTC versus placebo in HPTN 084 using data from an earlier Phase 3, placebo-controlled trial of daily oral TDF/FTC (Partners PrEP).

preprint2021arXiv

Demystifying statistical learning based on efficient influence functions

Evaluation of treatment effects and more general estimands is typically achieved via parametric modelling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g. statistical/machine learning) is commonly employed to reduce the risk of misspecification. Naive use of such methods, however, delivers estimators whose bias may shrink too slowly with sample size for inferential methods to perform well, including those based on the bootstrap. Bias arises because standard data-adaptive methods are tuned towards minimal prediction error as opposed to e.g. minimal MSE in the estimator. This may cause excess variability that is difficult to acknowledge, due to the complexity of such strategies. Building on results from non-parametric statistics, targeted learning and debiased machine learning overcome these problems by constructing estimators using the estimand's efficient influence function under the non-parametric model. These increasingly popular methodologies typically assume that the efficient influence function is given, or that the reader is familiar with its derivation. In this paper, we focus on derivation of the efficient influence function and explain how it may be used to construct statistical/machine-learning-based estimators. We discuss the requisite conditions for these estimators to perform well and use diverse examples to convey the broad applicability of the theory.

preprint2020arXiv

Assumption-lean inference for generalised linear model parameters

Inference for the parameters indexing generalised linear models is routinely based on the assumption that the model is correct and a priori specified. This is unsatisfactory because the chosen model is usually the result of a data-adaptive model selection process, which may induce excess uncertainty that is not usually acknowledged. Moreover, the assumptions encoded in the chosen model rarely represent some a priori known, ground truth, making standard inferences prone to bias, but also failing to give a pure reflection of the information that is contained in the data. Inspired by developments on assumption-free inference for so-called projection parameters, we here propose novel nonparametric definitions of main effect estimands and effect modification estimands. These reduce to standard main effect and effect modification parameters in generalised linear models when these models are correctly specified, but have the advantage that they continue to capture respectively the primary (conditional) association between two variables, or the degree to which two variables interact (in a statistical sense) in their effect on outcome, even when these models are misspecified. We achieve an assumption-lean inference for these estimands (and thus for the underlying regression parameters) by deriving their influence curve under the nonparametric model and invoking flexible data-adaptive (e.g., machine learning) procedures.

preprint2020arXiv

Doubly robust tests of exposure effects under high-dimensional confounding

After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (e.g. the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample-splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University Intensive Care Unit.

preprint2020arXiv

Principled Selection of Baseline Covariates to Account for Censoring in Randomized Trials with a Survival Endpoint

The analysis of randomized trials with time-to-event endpoints is nearly always plagued by the problem of censoring. As the censoring mechanism is usually unknown, analyses typically employ the assumption of non-informative censoring. While this assumption usually becomes more plausible as more baseline covariates are being adjusted for, such adjustment also raises concerns. Pre-specification of which covariates will be adjusted for (and how) is difficult, thus prompting the use of data-driven variable selection procedures, which may impede valid inferences to be drawn. The adjustment for covariates moreover adds concerns about model misspecification, and the fact that each change in adjustment set, also changes the censoring assumption and the treatment effect estimand. In this paper, we discuss these concerns and propose a simple variable selection strategy that aims to produce a valid test of the null in large samples. The proposal can be implemented using off-the-shelf software for (penalized) Cox regression, and is empirically found to work well in simulation studies and real data analyses.

Oliver Dukes

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

On estimands in target trial emulation

Generalizing the intention-to-treat effect of an active control against placebo from historical placebo-controlled trials to an active-controlled trial: A case study of the efficacy of daily oral TDF/FTC in the HPTN 084 study

Demystifying statistical learning based on efficient influence functions

Assumption-lean inference for generalised linear model parameters

Doubly robust tests of exposure effects under high-dimensional confounding

Principled Selection of Baseline Covariates to Account for Censoring in Randomized Trials with a Survival Endpoint