Researcher profile

Jose R. Zubizarreta

Jose R. Zubizarreta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Complex Discontinuity Designs Using Covariates: Impact of School Grade Retention on Later Life Outcomes in Chile

Regression discontinuity designs are extensively used for causal inference in observational studies. However, they are usually confined to settings with simple treatment rules, determined by a single running variable, with a single cutoff. Motivated by the problem of estimating the impact of grade retention on educational and juvenile crime outcomes in Chile, we propose a framework and methods for complex discontinuity designs that encompasses multiple treatment rules. In this framework, the observed covariates play a central role for identification, estimation, and generalization of causal effects. Identification is non-parametric and relies on a local strong ignorability assumption. Estimation proceeds as in any observational study under strong ignorability, yet in a neighborhood of the cutoffs of the running variables. We discuss estimation approaches based on matching and weighting, including complementary regression modeling adjustments. We present assumptions for generalization; that is, for identification and estimation of average treatment effects for target populations. We also describe two approaches to select the neighborhood for analysis. We find that grade retention in Chile has a negative impact on future grade retention, but is not associated with dropping out of school or committing a juvenile crime.

preprint2022arXiv

On the implied weights of linear regression for causal inference

A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling, and study representativeness? In this paper, we provide answers to this and related questions by analyzing the implied (individual-level data) weights of linear regression methods. We derive new closed-form expressions of the weights and examine their properties in both finite and asymptotic regimes. We show that the implied weights of general regression problems can be equivalently obtained by solving a convex optimization problem. Among others, we study doubly and multiply robust properties of regression estimators from the perspective of their implied weights. This equivalence allows us to bridge ideas from the regression modeling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. As special cases, we analyze the implied weights in common settings such as multi-valued treatments and regression adjustment after matching. We implement the weights and diagnostics in the new lmw package for R.

preprint2022arXiv

Profile Matching for the Generalization and Personalization of Causal Inferences

We introduce profile matching, a multivariate matching method for randomized experiments and observational studies that finds the largest possible unweighted samples across multiple treatment groups that are balanced relative to a covariate profile. This covariate profile can represent a specific population or a target individual, facilitating the generalization and personalization of causal inferences. For generalization, because the profile often amounts to summary statistics for a target population, profile matching does not always require accessing individual-level data, which may be unavailable for confidentiality reasons. For personalization, the profile comprises the characteristics of a single individual. Profile matching achieves covariate balance by construction, but unlike existing approaches to matching, it does not require specifying a matching ratio, as this is implicitly optimized for the data. The method can also be used for the selection of units for study follow-up, and it readily applies to multi-valued treatments with many treatment categories. We evaluate the performance of profile matching in a simulation study of the generalization of a randomized trial to a target population. We further illustrate this method in an exploratory observational study of the relationship between opioid use and mental health outcomes. We analyze these relationships for three covariate profiles representing: (i) sexual minorities, (ii) the Appalachian United States, and (iii) the characteristics of a hypothetical vulnerable patient. The method can be implemented via the new function profmatch in the designmatch package for R, for which we provide a step-by-step tutorial.

preprint2022arXiv

Using Cardinality Matching to Design Balanced and Representative Samples for Observational Studies

Cardinality matching is a computational method for finding the largest possible number of matched pairs of exposed and unexposed individuals from an observational dataset, with specified patterns of baseline characteristics that represent a target population for analysis. This article explains the process of cardinality matching and how it simultaneously addresses the concerns of balance, sample size, and representativeness of matched samples in observational studies.

preprint2021arXiv

Minimax Linear Estimation of the Retargeted Mean

Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the common practice of ignoring bias when building confidence intervals with these linear estimators. Focusing on the case that the class of the unknown outcome function is the unit ball of a reproducing kernel Hilbert space, we show that the resulting linear estimator is asymptotically optimal under conditions only marginally stronger than those used with augmented estimators. We establish bounds attesting to the estimator's good finite sample properties. In an extensive simulation study, we observe promising performance of the estimator throughout a wide range of sample sizes, noise levels, and levels of overlap between the covariate distributions of the treated and target populations.