Researcher profile

Eric Tchetgen Tchetgen

Eric Tchetgen Tchetgen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

A Selective Review of Negative Control Methods in Epidemiology

Purpose of Review: Negative controls are a powerful tool to detect and adjust for bias in epidemiological research. This paper introduces negative controls to a broader audience and provides guidance on principled design and causal analysis based on a formal negative control framework. Recent Findings: We review and summarize causal and statistical assumptions, practical strategies, and validation criteria that can be combined with subject matter knowledge to perform negative control analyses. We also review existing statistical methodologies for detection, reduction, and correction of confounding bias, and briefly discuss recent advances towards nonparametric identification of causal effects in a double negative control design. Summary: There is great potential for valid and accurate causal inference leveraging contemporary healthcare data in which negative controls are routinely available. Design and analysis of observational data leveraging negative controls is an area of growing interest in health and social sciences. Despite these developments, further effort is needed to disseminate these novel methods to ensure they are adopted by practicing epidemiologists.

preprint2022arXiv

Doubly Robust Proximal Causal Inference under Confounded Outcome-Dependent Sampling

Unmeasured confounding and selection bias are often of concern in observational studies and may invalidate a causal analysis if not appropriately accounted for. Under outcome-dependent sampling, a latent factor that has causal effects on the treatment, outcome, and sample selection process may cause both unmeasured confounding and selection bias, rendering standard causal parameters unidentifiable without additional assumptions. Under an odds ratio model for the treatment effect, Li et al. 2022 established both proximal identification and estimation of causal effects by leveraging a pair of negative control variables as proxies of latent factors at the source of both confounding and selection bias. However, their approach relies exclusively on the existence and correct specification of a so-called treatment confounding bridge function, a model that restricts the treatment assignment mechanism. In this article, we propose doubly robust estimation under the odds ratio model with respect to two nuisance functions -- a treatment confounding bridge function and an outcome confounding bridge function that restricts the outcome law, such that our estimator is consistent and asymptotically normal if either bridge function model is correctly specified, without knowing which one is. Thus, our proposed doubly robust estimator is potentially more robust than that of Li et al. 2022. Our simulations confirm that the proposed proximal estimators of an odds ratio causal effect can adequately account for both residual confounding and selection bias under stated conditions with well-calibrated confidence intervals in a wide range of scenarios, where standard methods generally fail to be consistent. In addition, the proposed doubly robust estimator is consistent if at least one confounding bridge function is correctly specified.

preprint2022arXiv

End-to-End Balancing for Causal Continuous Treatment-Effect Estimation

We study the problem of observational causal inference with continuous treatments in the framework of inverse propensity-score weighting. To obtain stable weights, we design a new algorithm based on entropy balancing that learns weights to directly maximize causal inference accuracy using end-to-end optimization. In the process of optimization, these weights are automatically tuned to the specific dataset and causal inference algorithm being used. We provide a theoretical analysis demonstrating consistency of our approach. Using synthetic and real-world data, we show that our algorithm estimates causal effect more accurately than baseline entropy balancing.

preprint2022arXiv

Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets

In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the prospects for improved fairness when members of a less privileged group are treated by a risk algorithm as if they are members of a more privileged group. We combine a machine learning classifier trained in a novel manner with an optimal transport adjustment for the relevant joint probability distributions, which together provide a constructive response to claims of bias-in-bias-out. A key distinction is between fairness claims that are empirically testable and fairness claims that are not. We then use confusion tables and conformal prediction sets to evaluate achieved fairness for projected risk. Our data are a random sample of 300,000 offenders at their arraignments for a large metropolitan area in the United States during which decisions to release or detain are made. We show that substantial improvement in fairness can be achieved consistent with a Pareto improvement for protected groups.

preprint2022arXiv

IV estimation of causal hazard ratio

Cox's proportional hazards model is one of the most popular statistical models to evaluate associations of exposure with a censored failure time outcome. When confounding factors are not fully observed, the exposure hazard ratio estimated using a Cox model is subject to unmeasured confounding bias. To address this, we propose a novel approach for the identification and estimation of the causal hazard ratio in the presence of unmeasured confounding factors. Our approach is based on a binary instrumental variable, and an additional no-interaction assumption in a first stage regression of the treatment on the IV and unmeasured confounders. We propose, to the best of our knowledge, the first consistent estimator of the (population) causal hazard ratio within an instrumental variable framework. A version of our estimator admits a closed-form representation. We derive the asymptotic distribution of our estimator, and provide a consistent estimator for its asymptotic variance. Our approach is illustrated via simulation studies and a data application.

preprint2022arXiv

Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference

Robins et al. (2008) introduced a class of influence functions (IFs) which could be used to obtain doubly robust moment functions for the corresponding parameters. However, that class does not include the IF of parameters for which the nuisance functions are solutions to integral equations. Such parameters are particularly important in the field of causal inference, specifically in the recently proposed proximal causal inference framework of Tchetgen Tchetgen et al. (2020), which allows for estimating the causal effect in the presence of latent confounders. In this paper, we first extend the class of Robins et al. to include doubly robust IFs in which the nuisance functions are solutions to integral equations. Then we demonstrate that the double robustness property of these IFs can be leveraged to construct estimating equations for the nuisance functions, which enables us to solve the integral equations without resorting to parametric models. We frame the estimation of the nuisance functions as a minimax optimization problem. We provide convergence rates for the nuisance functions and conditions required for asymptotic linearity of the estimator of the parameter of interest. The experiment results demonstrate that our proposed methodology leads to robust and high-performance estimators for average causal effect in the proximal causal inference framework.

preprint2022arXiv

Nonparametric inference about mean functionals of nonignorable nonresponse data without identifying the joint distribution

We consider identification and inference about mean functionals of observed covariates and an outcome variable subject to nonignorable missingness. By leveraging a shadow variable, we establish a necessary and sufficient condition for identification of the mean functional even if the full data distribution is not identified. We further characterize a necessary condition for $\sqrt{n}$-estimability of the mean functional. This condition naturally strengthens the identifying condition, and it requires the existence of a function as a solution to a representer equation that connects the shadow variable to the mean functional. Solutions to the representer equation may not be unique, which presents substantial challenges for nonparametric estimation and standard theories for nonparametric sieve estimators are not applicable here. We construct a consistent estimator for the solution set and then adapt the theory of extremum estimators to find from the estimated set a consistent estimator for an appropriately chosen solution. The estimator is asymptotically normal, locally efficient and attains the semiparametric efficiency bound under certain regularity conditions. We illustrate the proposed approach via simulations and a real data application on home pricing.

preprint2022arXiv

Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable

Instrumental variable methods have been widely used to identify causal effects in the presence of unmeasured confounding. A key identification condition known as the exclusion restriction states that the instrument cannot have a direct effect on the outcome which is not mediated by the exposure in view. In the health and social sciences, such an assumption is often not credible. To address this concern, we consider identification conditions of the population average treatment effect with an invalid instrumental variable which does not satisfy the exclusion restriction, and derive the efficient influence function targeting the identifying functional under a nonparametric observed data model. We propose a novel multiply robust locally efficient estimator of the average treatment effect that is consistent in the union of multiple parametric nuisance models, as well as a multiply debiased machine learning estimator for which the nuisance parameters are estimated using generic machine learning methods, that effectively exploit various forms of linear or nonlinear structured sparsity in the nuisance parameter space. When one cannot be confident that any of these machine learners is consistent at sufficiently fast rates to ensure $\surd{n}$-consistency for the average treatment effect, we introduce a new criteria for selective machine learning which leverages the multiple robustness property in order to ensure small bias. The proposed methods are illustrated through extensive simulations and a data analysis evaluating the causal effect of 401(k) participation on savings.

preprint2022arXiv

Semiparametric Efficient G-estimation with Invalid Instrumental Variables

The instrumental variable method is widely used in the health and social sciences for identification and estimation of causal effects in the presence of potentially unmeasured confounding. In order to improve efficiency, multiple instruments are routinely used, leading to concerns about bias due to possible violation of the instrumental variable assumptions. To address this concern, we introduce a new class of g-estimators that are guaranteed to remain consistent and asymptotically normal for the causal effect of interest provided that a set of at least $γ$ out of $K$ candidate instruments are valid, for $γ\leq K$ set by the analyst ex ante, without necessarily knowing the identities of the valid and invalid instruments. We provide formal semiparametric efficiency theory supporting our results. Both simulation studies and applications to the UK Biobank data demonstrate the superior empirical performance of our estimators compared to competing methods.

preprint2022arXiv

Validating Causal Inference Methods

The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no `one-size-fits-all' causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework's novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.

preprint2020arXiv

A semiparametric instrumental variable approach to optimal treatment regimes under endogeneity

There is a fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we establish identification of both value function $E[Y_{\mathcal{D}(L)}]$ for a given regime $\mathcal{D}$ and optimal regimes $\text{argmax}_{\mathcal{D}} E[Y_{\mathcal{D}(L)}]$ with the aid of a binary instrumental variable, when no unmeasured confounding fails to hold. We also construct novel multiply robust classification-based estimators. Furthermore, we propose to identify and estimate optimal treatment regimes among those who would comply to the assigned treatment under a standard monotonicity assumption. In this latter case, we establish the somewhat surprising result that complier optimal regimes can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Our approach is illustrated via extensive simulation studies and a data application on the effect of child rearing on labor participation.

preprint2020arXiv

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment

Robins 1997 introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. In his work, identification of MSM parameters is established under a sequential randomization assumption (SRA), which rules out unmeasured confounding of treatment assignment over time. We consider sufficient conditions for identification of the parameters of a subclass, Marginal Structural Mean Models (MSMMs), when sequential randomization fails to hold due to unmeasured confounding, using instead a time-varying instrumental variable. Our identification conditions require that no unobserved confounder predicts compliance type for the time-varying treatment. We describe a simple weighted estimator and examine its finite-sample properties in a simulation study. We apply the proposed estimator to examine the effect of delivery hospital on neonatal survival probability.

preprint2020arXiv

Regression-based Negative Control of Homophily in Dyadic Peer Effect Analysis

A prominent threat to causal inference about peer effects over social networks is the presence of homophily bias, that is, social influence between friends and families is entangled with common characteristics or underlying similarities that form close connections. Analysis of social network data has suggested that certain health conditions such as obesity and psychological states including happiness and loneliness can spread over a network. However, such analyses of peer effects or contagion effects have come under criticism because homophily bias may compromise the causal statement. We develop a regression-based approach which leverages a negative control exposure for identification and estimation of contagion effects on additive or multiplicative scales, in the presence of homophily bias. We apply our methods to evaluate the peer effect of obesity in Framingham Offspring Study.