Source author record

Jose R. Zubizarreta

Jose R. Zubizarreta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications math.ST Statistics Theory

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Complex Discontinuity Designs Using Covariates: Impact of School Grade Retention on Later Life Outcomes in Chile

Regression discontinuity designs are extensively used for causal inference in observational studies. However, they are usually confined to settings with simple treatment rules, determined by a single running variable, with a single cutoff. Motivated by the problem of estimating the impact of grade retention on educational and juvenile crime outcomes in Chile, we propose a framework and methods for complex discontinuity designs that encompasses multiple treatment rules. In this framework, the observed covariates play a central role for identification, estimation, and generalization of causal effects. Identification is non-parametric and relies on a local strong ignorability assumption. Estimation proceeds as in any observational study under strong ignorability, yet in a neighborhood of the cutoffs of the running variables. We discuss estimation approaches based on matching and weighting, including complementary regression modeling adjustments. We present assumptions for generalization; that is, for identification and estimation of average treatment effects for target populations. We also describe two approaches to select the neighborhood for analysis. We find that grade retention in Chile has a negative impact on future grade retention, but is not associated with dropping out of school or committing a juvenile crime.

preprint2022arXiv

On the implied weights of linear regression for causal inference

A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling, and study representativeness? In this paper, we provide answers to this and related questions by analyzing the implied (individual-level data) weights of linear regression methods. We derive new closed-form expressions of the weights and examine their properties in both finite and asymptotic regimes. We show that the implied weights of general regression problems can be equivalently obtained by solving a convex optimization problem. Among others, we study doubly and multiply robust properties of regression estimators from the perspective of their implied weights. This equivalence allows us to bridge ideas from the regression modeling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. As special cases, we analyze the implied weights in common settings such as multi-valued treatments and regression adjustment after matching. We implement the weights and diagnostics in the new lmw package for R.

preprint2022arXiv

Profile Matching for the Generalization and Personalization of Causal Inferences

We introduce profile matching, a multivariate matching method for randomized experiments and observational studies that finds the largest possible unweighted samples across multiple treatment groups that are balanced relative to a covariate profile. This covariate profile can represent a specific population or a target individual, facilitating the generalization and personalization of causal inferences. For generalization, because the profile often amounts to summary statistics for a target population, profile matching does not always require accessing individual-level data, which may be unavailable for confidentiality reasons. For personalization, the profile comprises the characteristics of a single individual. Profile matching achieves covariate balance by construction, but unlike existing approaches to matching, it does not require specifying a matching ratio, as this is implicitly optimized for the data. The method can also be used for the selection of units for study follow-up, and it readily applies to multi-valued treatments with many treatment categories. We evaluate the performance of profile matching in a simulation study of the generalization of a randomized trial to a target population. We further illustrate this method in an exploratory observational study of the relationship between opioid use and mental health outcomes. We analyze these relationships for three covariate profiles representing: (i) sexual minorities, (ii) the Appalachian United States, and (iii) the characteristics of a hypothetical vulnerable patient. The method can be implemented via the new function profmatch in the designmatch package for R, for which we provide a step-by-step tutorial.

preprint2022arXiv

Using Cardinality Matching to Design Balanced and Representative Samples for Observational Studies

Cardinality matching is a computational method for finding the largest possible number of matched pairs of exposed and unexposed individuals from an observational dataset, with specified patterns of baseline characteristics that represent a target population for analysis. This article explains the process of cardinality matching and how it simultaneously addresses the concerns of balance, sample size, and representativeness of matched samples in observational studies.

preprint2021arXiv

Minimax Linear Estimation of the Retargeted Mean

Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the common practice of ignoring bias when building confidence intervals with these linear estimators. Focusing on the case that the class of the unknown outcome function is the unit ball of a reproducing kernel Hilbert space, we show that the resulting linear estimator is asymptotically optimal under conditions only marginally stronger than those used with augmented estimators. We establish bounds attesting to the estimator's good finite sample properties. In an extensive simulation study, we observe promising performance of the estimator throughout a wide range of sample sizes, noise levels, and levels of overlap between the covariate distributions of the treated and target populations.

preprint2016arXiv

Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System

A distinctive feature of a clustered observational study is its multilevel or nested data structure arising from the assignment of treatment, in a non-random manner, to groups or clusters of units or individuals. Examples are ubiquitous in the health and social sciences including patients in hospitals, employees in firms, and students in schools. What is the optimal matching strategy in a clustered observational study? At first thought, one might start by matching clusters of individuals and then, within matched clusters, continue by matching individuals. But as we discuss in this paper, the optimal strategy is the opposite: in typical applications, where the intracluster correlation is not perfect, it is best to first match individuals and, once all possible combinations of matched individuals are known, then match clusters. In this paper we use dynamic and integer programming to implement this strategy and extend optimal matching methods to hierarchical and multilevel settings. Among other matched designs, our strategy can approximate a paired clustered randomized study by finding the largest sample of matched pairs of treated and control individuals within matched pairs of treated and control clusters that is balanced according to specifications given by the investigator. This strategy directly balances covariates both at the cluster and individual levels and does not require estimating the propensity score, although the propensity score can be balanced as an additional covariate. We illustrate our results with a case study of the comparative effectiveness of public versus private voucher schools in Chile, a question of intense policy debate in the country at the present.

Jose R. Zubizarreta

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Complex Discontinuity Designs Using Covariates: Impact of School Grade Retention on Later Life Outcomes in Chile

On the implied weights of linear regression for causal inference

Profile Matching for the Generalization and Personalization of Causal Inferences

Using Cardinality Matching to Design Balanced and Representative Samples for Observational Studies

Minimax Linear Estimation of the Retargeted Mean

Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System