Source author record

Ilya Shpitser

Ilya Shpitser appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology Artificial Intelligence math.ST Statistics Theory Computation and Language econ.EM

Catalog footprint

What is connected

23works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference

Robins et al. (2008) introduced a class of influence functions (IFs) which could be used to obtain doubly robust moment functions for the corresponding parameters. However, that class does not include the IF of parameters for which the nuisance functions are solutions to integral equations. Such parameters are particularly important in the field of causal inference, specifically in the recently proposed proximal causal inference framework of Tchetgen Tchetgen et al. (2020), which allows for estimating the causal effect in the presence of latent confounders. In this paper, we first extend the class of Robins et al. to include doubly robust IFs in which the nuisance functions are solutions to integral equations. Then we demonstrate that the double robustness property of these IFs can be leveraged to construct estimating equations for the nuisance functions, which enables us to solve the integral equations without resorting to parametric models. We frame the estimation of the nuisance functions as a minimax optimization problem. We provide convergence rates for the nuisance functions and conditions required for asymptotic linearity of the estimator of the parameter of interest. The experiment results demonstrate that our proposed methodology leads to robust and high-performance estimators for average causal effect in the proximal causal inference framework.

preprint2022arXiv

Optimal Training of Fair Predictive Models

Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.

preprint2021arXiv

Differentiable Causal Discovery Under Unmeasured Confounding

The data drawn from biological, economic, and social systems are often confounded due to the presence of unmeasured variables. Prior work in causal discovery has focused on discrete search procedures for selecting acyclic directed mixed graphs (ADMGs), specifically ancestral ADMGs, that encode ordinary conditional independence constraints among the observed variables of the system. However, confounded systems also exhibit more general equality restrictions that cannot be represented via these graphs, placing a limit on the kinds of structures that can be learned using ancestral ADMGs. In this work, we derive differentiable algebraic constraints that fully characterize the space of ancestral ADMGs, as well as more general classes of ADMGs, arid ADMGs and bow-free ADMGs, that capture all equality restrictions on the observed variables. We use these constraints to cast causal discovery as a continuous optimization problem and design differentiable procedures to find the best fitting ADMG when the data comes from a confounded linear system of equations with correlated errors. We demonstrate the efficacy of our method through simulations and application to a protein expression dataset. Code implementing our methods is open-source and publicly available at https://gitlab.com/rbhatta8/dcd and will be incorporated into the Ananke package.

preprint2021arXiv

Generating Synthetic Text Data to Evaluate Causal Inference Methods

Drawing causal conclusions from observational data requires making assumptions about the true data-generating process. Causal inference research typically considers low-dimensional data, such as categorical or numerical fields in structured medical records. High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects. Models for natural language generation have been widely studied and perform well empirically. However, existing methods not immediately applicable to producing synthetic datasets for causal evaluations, as they do not allow for quantifying a causal effect on the text itself. In this work, we develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects. We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data. We release our code and synthetic datasets.

preprint2020arXiv

A Semiparametric Approach to Interpretable Machine Learning

Black box models in machine learning have demonstrated excellent predictive performance in complex problems and high-dimensional settings. However, their lack of transparency and interpretability restrict the applicability of such models in critical decision-making processes. In order to combat this shortcoming, we propose a novel approach to trading off interpretability and performance in prediction models using ideas from semiparametric statistics, allowing us to combine the interpretability of parametric regression models with performance of nonparametric methods. We achieve this by utilizing a two-piece model: the first piece is interpretable and parametric, to which a second, uninterpretable residual piece is added. The performance of the overall model is optimized using methods from the sufficient dimension reduction literature. Influence function based estimators are derived and shown to be doubly robust. This allows for use of approaches such as double Machine Learning in estimating our model parameters. We illustrate the utility of our approach via simulation studies and a data application based on predicting the length of stay in the intensive care unit among surgery patients.

preprint2020arXiv

Causal inference, social networks, and chain graphs

Traditionally, statistical and causal inference on human subjects rely on the assumption that individuals are independently affected by treatments or exposures. However, recently there has been increasing interest in settings, such as social networks, where individuals may interact with one another such that treatments may spill over from the treated individual to their social contacts and outcomes may be contagious. Existing models proposed for causal inference using observational data from networks of interacting individuals have two major shortcomings. First, they often require a level of granularity in the data that is practically infeasible to collect in most settings, and second, the models are high-dimensional and often too big to fit to the available data. In this paper we illustrate and justify a parsimonious parameterization for network data with interference and contagion. Our parameterization corresponds to a particular family of graphical models known as chain graphs. We argue that, in some settings, chain graph models approximate the marginal distribution of a snapshot of a longitudinal data generating process on interacting units. We illustrate the use of chain graphs for causal inference about collective decision making in social networks using data from U.S. Supreme Court decisions between 1994 and 2004 and in simulations.

preprint2020arXiv

Counterexamples to "The Blessings of Multiple Causes" by Wang and Blei

This note has been updated (April, 2020) to respond to "Towards Clarifying the Theory of the Deconfounder" by Yixin Wang, David M. Blei (arXiv:2003.04948). This original note, posted in January, 2020, is meant to complement our previous comment on "The Blessings of Multiple Causes" by Wang and Blei (2019). We provide a more succinct and transparent explanation of the fact that the deconfounder does not control for multi-cause confounding. The argument given in Wang and Blei (2019) makes two mistakes: (1) attempting to infer independence conditional on one variable from independence conditional on a different, unrelated variable, and (2) attempting to infer joint independence from pairwise independence. We give two simple counterexamples to the deconfounder claim.

preprint2020arXiv

Deriving Bounds and Inequality Constraints Using LogicalRelations Among Counterfactuals

Causal parameters may not be point identified in the presence of unobserved confounding. However, information about non-identified parameters, in the form of bounds, may still be recovered from the observed data in some cases. We develop a new general method for obtaining bounds on causal parameters using rules of probability and restrictions on counterfactuals implied by causal graphical models. We additionally provide inequality constraints on functionals of the observed data law implied by such causal models. Our approach is motivated by the observation that logical relations between identified and non-identified counterfactual events often yield information about non-identified events. We show that this approach is powerful enough to recover known sharp bounds and tight inequality constraints, and to derive novel bounds and constraints.

preprint2020arXiv

Full Law Identification In Graphical Models Of Missing Data: Completeness Results

Missing data has the potential to affect analyses conducted in all fields of scientific study, including healthcare, economics, and the social sciences. Several approaches to unbiased inference in the presence of non-ignorable missingness rely on the specification of the target distribution and its missingness process as a probability distribution that factorizes with respect to a directed acyclic graph. In this paper, we address the longstanding question of the characterization of models that are identifiable within this class of missing data distributions. We provide the first completeness result in this field of study -- necessary and sufficient graphical conditions under which, the full data distribution can be recovered from the observed data distribution. We then simultaneously address issues that may arise due to the presence of both missing data and unmeasured confounding, by extending these graphical conditions and proofs of completeness, to settings where some variables are not just missing, but completely unobserved.

preprint2020arXiv

General Identification of Dynamic Treatment Regimes Under Interference

In many applied fields, researchers are often interested in tailoring treatments to unit-level characteristics in order to optimize an outcome of interest. Methods for identifying and estimating treatment policies are the subject of the dynamic treatment regime literature. Separately, in many settings the assumption that data are independent and identically distributed does not hold due to inter-subject dependence. The phenomenon where a subject's outcome is dependent on his neighbor's exposure is known as interference. These areas intersect in myriad real-world settings. In this paper we consider the problem of identifying optimal treatment policies in the presence of interference. Using a general representation of interference, via Lauritzen-Wermuth-Freydenburg chain graphs (Lauritzen and Richardson, 2002), we formalize a variety of policy interventions under interference and extend existing identification theory (Tian, 2008; Sherman and Shpitser, 2018). Finally, we illustrate the efficacy of policy maximization under interference in a simulation study.

preprint2020arXiv

Identification Methods With Arbitrary Interventional Distributions as Inputs

Causal inference quantifies cause-effect relationships by estimating counterfactual parameters from data. This entails using \emph{identification theory} to establish a link between counterfactual parameters of interest and distributions from which data is available. A line of work characterized non-parametric identification for a wide variety of causal parameters in terms of the \emph{observed data distribution}. More recently, identification results have been extended to settings where experimental data from interventional distributions is also available. In this paper, we use Single World Intervention Graphs and a nested factorization of models associated with mixed graphs to give a very simple view of existing identification theory for experimental data. We use this view to yield general identification algorithms for settings where the input distributions consist of an arbitrary set of observational and experimental distributions, including marginal and conditional distributions. We show that for problems where inputs are interventional marginal distributions of a certain type (ancestral marginals), our algorithm is complete.

preprint2020arXiv

Nonmyopic and pseudo-nonmyopic approaches to optimal sequential design in the presence of covariates

In sequential experiments, subjects become available for the study over a period of time, and covariates are often measured at the time of arrival. We consider the setting where the sample size is fixed but covariate values are unknown until subjects enrol. Given a model for the outcome, a sequential optimal design approach can be used to allocate treatments to minimize the variance of the treatment effect. We extend existing optimal design methodology so it can be used within a nonmyopic framework, where treatment allocation for the current subject depends not only on the treatments and covariates of the subjects already enrolled in the study, but also the impact of possible future treatment assignments. The nonmyopic approach is computationally expensive as it requires recursive formulae. We propose a pseudo-nonmyopic approach which has a similar aim to the nonmyopic approach, but does not involve recursion and instead relies on simulations of future possible decisions. Our simulation studies show that the myopic approach is the most efficient for the logistic model case with a single binary covariate and binary treatment.

preprint2015arXiv

Causal Inference with a Graphical Hierarchy of Interventions

Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another. Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures. In this paper, we give a unifying view of a large class of causal effects of interest in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula. Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl's front-door criterion.

preprint2014arXiv

Causal Graphs: Addressing the Confounding Problem Without Instruments or Ignorability

Discussion of "Instrumental Variables: An Econometrician's Perspective" by Guido W. Imbens [arXiv:1410.0163].

preprint2013arXiv

Counterfactual Graphical Models for Longitudinal Mediation Analysis with Unobserved Confounding

Questions concerning mediated causal effects are of great interest in psychology, cognitive science, medicine, social science, public health, and many other disciplines. For instance, about 60% of recent papers published in leading journals in social psychology contain at least one mediation test (Rucker, Preacher, Tormala, & Petty, 2011). Standard parametric approaches to mediation analysis employ regression models, and either the "difference method" (Judd & Kenny, 1981), more common in epidemiology, or the "product method" (Baron & Kenny, 1986), more common in the social sciences. In this paper we first discuss a known, but perhaps often unappreciated fact: that these parametric approaches are a special case of a general counterfactual framework for reasoning about causality first described by Neyman (1923), and Rubin (1974), and linked to causal graphical models by J. Robins (1986), and Pearl (2000). We then show a number of advantages of this framework. First, it makes the strong assumptions underlying mediation analysis explicit. Second, it avoids a number of problems present in the product and difference methods, such as biased estimates of effects in certain cases. Finally, we show the generality of this framework by proving a novel result which allows mediation analysis to be applied to longitudinal settings with unobserved confounders.

preprint2013arXiv

Sparse Nested Markov models with Log-linear Parameters

Hidden variables are ubiquitous in practical data analysis, and therefore modeling marginal densities and doing inference with the resulting models is an important problem in statistics, machine learning, and causal inference. Recently, a new type of graphical model, called the nested Markov model, was developed which captures equality constraints found in marginals of directed acyclic graph (DAG) models. Some of these constraints, such as the so called `Verma constraint', strictly generalize conditional independence. To make modeling and inference with nested Markov models practical, it is necessary to limit the number of parameters in the model, while still correctly capturing the constraints in the marginal of a DAG model. Placing such limits is similar in spirit to sparsity methods for undirected graphical models, and regression models. In this paper, we give a log-linear parameterization which allows sparse modeling with nested Markov models. We illustrate the advantages of this parameterization with a simulation study.

preprint2012arXiv

An Efficient Algorithm for Computing Interventional Distributions in Latent Variable Causal Models

Probabilistic inference in graphical models is the task of computing marginal and conditional densities of interest from a factorized representation of a joint probability distribution. Inference algorithms such as variable elimination and belief propagation take advantage of constraints embedded in this factorization to compute such densities efficiently. In this paper, we propose an algorithm which computes interventional distributions in latent variable causal models represented by acyclic directed mixed graphs(ADMGs). To compute these distributions efficiently, we take advantage of a recursive factorization which generalizes the usual Markov factorization for DAGs and the more recent factorization for ADMGs. Our algorithm can be viewed as a generalization of variable elimination to the mixed graph case. We show our algorithm is exponential in the mixed graph generalization of treewidth.

preprint2012arXiv

Effects of Treatment on the Treated: Identification and Generalization

Many applications of causal analysis call for assessing, retrospectively, the effect of withholding an action that has in fact been implemented. This counterfactual quantity, sometimes called "effect of treatment on the treated," (ETT) have been used to to evaluate educational programs, critic public policies, and justify individual decision making. In this paper we explore the conditions under which ETT can be estimated from (i.e., identified in) experimental and/or observational studies. We show that, when the action invokes a singleton variable, the conditions for ETT identification have simple characterizations in terms of causal diagrams. We further give a graphical characterization of the conditions under which the effects of multiple treatments on the treated can be identified, as well as ways in which the ETT estimand can be constructed from both interventional and observational distributions.

preprint2012arXiv

Identification of Conditional Interventional Distributions

The subject of this paper is the elucidation of effects of actions from causal assumptions represented as a directed graph, and statistical knowledge given as a probability distribution. In particular, we are interested in predicting conditional distributions resulting from performing an action on a set of variables and, subsequently, taking measurements of another set. We provide a necessary and sufficient graphical condition for the cases where such distributions can be uniquely computed from the available information, as well as an algorithm which performs this computation whenever the condition holds. Furthermore, we use our results to prove completeness of do-calculus [Pearl, 1995] for the same identification problem.

preprint2012arXiv

On the Validity of Covariate Adjustment for Estimating Causal Effects

Identifying effects of actions (treatments) on outcome variables from observational data and causal assumptions is a fundamental problem in causal inference. This identification is made difficult by the presence of confounders which can be related to both treatment and outcome variables. Confounders are often handled, both in theory and in practice, by adjusting for covariates, in other words considering outcomes conditioned on treatment and covariate values, weighed by probability of observing those covariate values. In this paper, we give a complete graphical criterion for covariate adjustment, which we term the adjustment criterion, and derive some interesting corollaries of the completeness of this criterion.

preprint2012arXiv

Parameter and Structure Learning in Nested Markov Models

The constraints arising from DAG models with latent variables can be naturally represented by means of acyclic directed mixed graphs (ADMGs). Such graphs contain directed and bidirected arrows, and contain no directed cycles. DAGs with latent variables imply independence constraints in the distribution resulting from a 'fixing' operation, in which a joint distribution is divided by a conditional. This operation generalizes marginalizing and conditioning. Some of these constraints correspond to identifiable 'dormant' independence constraints, with the well known 'Verma constraint' as one example. Recently, models defined by a set of the constraints arising after fixing from a DAG with latents, were characterized via a recursive factorization and a nested Markov property. In addition, a parameterization was given in the discrete case. In this paper we use this parameterization to describe a parameter fitting algorithm, and a search and score structure learning algorithm for these nested Markov models. We apply our algorithms to a variety of datasets.

preprint2012arXiv

Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis

While estimation of the marginal (total) causal effect of a point exposure on an outcome is arguably the most common objective of experimental and observational studies in the health and social sciences, in recent years, investigators have also become increasingly interested in mediation analysis. Specifically, upon evaluating the total effect of the exposure, investigators routinely wish to make inferences about the direct or indirect pathways of the effect of the exposure, through a mediator variable or not, that occurs subsequently to the exposure and prior to the outcome. Although powerful semiparametric methodologies have been developed to analyze observational studies that produce double robust and highly efficient estimates of the marginal total causal effect, similar methods for mediation analysis are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about so-called marginal natural direct and indirect causal effects, while appropriately accounting for a large number of pre-exposure confounding factors for the exposure and the mediator variables. Our analytic framework is particularly appealing, because it gives new insights on issues of efficiency and robustness in the context of mediation analysis. In particular, we propose new multiply robust locally efficient estimators of the marginal natural indirect and direct causal effects, and develop a novel double robust sensitivity analysis framework for the assumption of ignorability of the mediator variable.

preprint2012arXiv

What Counterfactuals Can Be Tested

Counterfactual statements, e.g., "my headache would be gone had I taken an aspirin" are central to scientific discourse, and are formally interpreted as statements derived from "alternative worlds". However, since they invoke hypothetical states of affairs, often incompatible with what is actually known or observed, testing counterfactuals is fraught with conceptual and practical difficulties. In this paper, we provide a complete characterization of "testable counterfactuals," namely, counterfactual statements whose probabilities can be inferred from physical experiments. We provide complete procedures for discerning whether a given counterfactual is testable and, if so, expressing its probability in terms of experimental data.

Ilya Shpitser

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference

Optimal Training of Fair Predictive Models

Differentiable Causal Discovery Under Unmeasured Confounding

Generating Synthetic Text Data to Evaluate Causal Inference Methods

A Semiparametric Approach to Interpretable Machine Learning

Causal inference, social networks, and chain graphs

Counterexamples to "The Blessings of Multiple Causes" by Wang and Blei

Deriving Bounds and Inequality Constraints Using LogicalRelations Among Counterfactuals

Full Law Identification In Graphical Models Of Missing Data: Completeness Results

General Identification of Dynamic Treatment Regimes Under Interference

Identification Methods With Arbitrary Interventional Distributions as Inputs

Nonmyopic and pseudo-nonmyopic approaches to optimal sequential design in the presence of covariates

Causal Inference with a Graphical Hierarchy of Interventions

Causal Graphs: Addressing the Confounding Problem Without Instruments or Ignorability

Counterfactual Graphical Models for Longitudinal Mediation Analysis with Unobserved Confounding

Sparse Nested Markov models with Log-linear Parameters

An Efficient Algorithm for Computing Interventional Distributions in Latent Variable Causal Models

Effects of Treatment on the Treated: Identification and Generalization

Identification of Conditional Interventional Distributions

On the Validity of Covariate Adjustment for Estimating Causal Effects

Parameter and Structure Learning in Nested Markov Models

Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis

What Counterfactuals Can Be Tested