Source author record

Luke Keele

Luke Keele appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications econ.EM

Catalog footprint

What is connected

10works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Covariate Adjustment in Regression Discontinuity Designs

The Regression Discontinuity (RD) design is a widely used non-experimental method for causal inference and program evaluation. While its canonical formulation only requires a score and an outcome variable, it is common in empirical work to encounter RD analyses where additional variables are used for adjustment. This practice has led to misconceptions about the role of covariate adjustment in RD analysis, from both methodological and empirical perspectives. In this chapter, we review the different roles of covariate adjustment in RD designs, and offer methodological guidance for its correct use.

preprint2022arXiv

Nonparametric identification of causal effects in clustered observational studies with differential selection

The clustered observational study (COS) design is the observational study counterpart to the clustered randomized trial. In a COS, a treatment is assigned to intact groups, and all units within the group are exposed to the treatment. However, the treatment is non-randomly assigned. COSs are common in both education and health services research. In education, treatments may be given to all students within some schools but withheld from all students in other schools. In health studies, treatments may be applied to clusters such as hospitals or groups of patients treated by the same physician. In this manuscript, we study the identification of causal effects in clustered observational study designs. We focus on the prospect of differential selection of units to clusters, which occurs when the units' cluster selections depend on the clusters' treatment assignments. Extant work on COSs has made an implicit assumption that rules out the presence of differential selection. We derive the identification results for designs with differential selection and that contexts with differential cluster selection require different adjustment sets than standard designs. We outline estimators for designs with and without differential selection. Using a series of simulations, we outline the magnitude of the bias that can occur with differential selection. We then present two empirical applications focusing on the likelihood of differential selection.

preprint2021arXiv

Hospital Quality Risk Standardization via Approximate Balancing Weights

Comparing outcomes across hospitals, often to identify underperforming hospitals, is a critical task in health services research. However, naive comparisons of average outcomes, such as surgery complication rates, can be misleading because hospital case mixes differ -- a hospital's overall complication rate may be lower due to more effective treatments or simply because the hospital serves a healthier population overall. In this paper, we develop a method of ``direct standardization'' where we re-weight each hospital patient population to be representative of the overall population and then compare the weighted averages across hospitals. Adapting methods from survey sampling and causal inference, we find weights that directly control for imbalance between the hospital patient mix and the target population, even across many patient attributes. Critically, these balancing weights can also be tuned to preserve sample size for more precise estimates. We also derive principled measures of statistical precision, and use outcome modeling and Bayesian shrinkage to increase precision and account for variation in hospital size. We demonstrate these methods using claims data from Pennsylvania, Florida, and New York, estimating standardized hospital complication rates for general surgery patients. We conclude with a discussion of how to detect low performing hospitals.

preprint2020arXiv

Comparing the Performance of Statistical Adjustment Methods By Recovering the Experimental Benchmark from the REFLUX Trial

Much evidence in comparative effectiveness research is based on observational studies. Researchers who conduct observational studies typically assume that there are no unobservable differences between the treated and control groups. Treatment effects are estimated after adjusting for observed differences between treated and controls. However, treatment effect estimates may be biased due to model misspecification. That is, if the method of treatment effect estimation imposes unduly strong functional form assumptions, treatment effect estimates may be significantly biased. In this study, we compare the performance of a wide variety of treatment effect estimation methods. We do so within the context of the REFLUX study from the UK. In REFLUX, after study qualification, participants were enrolled in either a randomized trial arm or patient preference arm. In the randomized trial, patients were randomly assigned to either surgery or medical management. In the patient preference arm, participants selected to either have surgery or medical management. We attempt to recover the treatment effect estimate from the randomized trial arm using the data from the patient preference arm of the study. We vary the method of treatment effect estimation and record which methods are successful and which are not. We apply over 20 different methods including standard regression models as well as advanced machine learning methods. We find that simple propensity score matching methods perform the worst. We also find significant variation in performance across methods. The wide variation in performance suggests analysts should use multiple methods of estimation as a robustness check.

preprint2020arXiv

Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs

In non-experimental settings, the Regression Discontinuity (RD) design is one of the most credible identification strategies for program evaluation and causal inference. However, RD treatment effect estimands are necessarily local, making statistical methods for the extrapolation of these effects a key area for development. We introduce a new method for extrapolation of RD effects that relies on the presence of multiple cutoffs, and is therefore design-based. Our approach employs an easy-to-interpret identifying assumption that mimics the idea of "common trends" in difference-in-differences designs. We illustrate our methods with data on a subsidized loan program on post-education attendance in Colombia, and offer new evidence on program effects for students with test scores away from the cutoff that determined program eligibility.

preprint2020arXiv

Protocol for a Study of the Effect of Surface Mining in Central Appalachia on Adverse Birth Outcomes

Surface mining has become a major method of coal mining in Central Appalachia alongside the traditional underground mining. Concerns have been raised about the health effects of this surface mining, particularly mountaintop removal mining where coal is mined upon steep mountaintops by removing the mountaintop through clearcutting forests and explosives. We have designed a matched observational study to assess the effects of surface mining in Central Appalachia on adverse birth outcomes. This protocol describes for the study the background and motivation, the sample selection and the analysis plan.

preprint2016arXiv

Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System

A distinctive feature of a clustered observational study is its multilevel or nested data structure arising from the assignment of treatment, in a non-random manner, to groups or clusters of units or individuals. Examples are ubiquitous in the health and social sciences including patients in hospitals, employees in firms, and students in schools. What is the optimal matching strategy in a clustered observational study? At first thought, one might start by matching clusters of individuals and then, within matched clusters, continue by matching individuals. But as we discuss in this paper, the optimal strategy is the opposite: in typical applications, where the intracluster correlation is not perfect, it is best to first match individuals and, once all possible combinations of matched individuals are known, then match clusters. In this paper we use dynamic and integer programming to implement this strategy and extend optimal matching methods to hierarchical and multilevel settings. Among other matched designs, our strategy can approximate a paired clustered randomized study by finding the largest sample of matched pairs of treated and control individuals within matched pairs of treated and control clusters that is balanced according to specifications given by the investigator. This strategy directly balances covariates both at the cluster and individual levels and does not require estimating the propensity score, although the propensity score can be balanced as an additional covariate. We illustrate our results with a case study of the comparative effectiveness of public versus private voucher schools in Chile, a question of intense policy debate in the country at the present.

preprint2015arXiv

The Plateau Problem in the Heteroskedastic Probit Model

In parameter determination for the heteroskedastic probit model, both in simulated data and in actual data, we observe a failure of traditional local search methods to converge consistently to a single parameter vector, in contrast to the typical situation for the regular probit model. We identify features of the heteroskedastic probit log likelihood function that we argue tend to lead to this failure, and suggest ways to amend the local search methods to remedy the problem.

preprint2014arXiv

Variable-Ratio Matching with Fine Balance in a Study of Peer Health Exchange

In observational studies of treatment effects, matched samples are created so treated and control groups are similar in terms of observable covariates. Traditionally such matched samples consist of matched pairs. If a pair match fails to make treated and control units sufficiently comparable, alternative forms of matching may be necessary. One general strategy to improve balance is to match a variable number of control units to each treated unit. A more tailored strategy is to adopt a fine balance constraint. Under a fine balance constraint, a nominal covariate is exactly balanced, but it does not require individually matched treated and control subjects for this variable. In the example, we seek to construct a matched sample for an ongoing evaluation of Peer Health Exchange, an intervention in schools designed to decrease risky health behaviors among youth. We find that an optimal pair match that minimizes distances between pairs creates a matched sample where balance is poor. Here we propose a method to allow for fine balance constraints when each treated unit is matched to a variable number of control units, which is not currently possible using existing matching algorithms. Our approach uses the entire number to first determine the optimal number of controls for each treated unit. For each strata of matched treated units, we can then apply a fine balance constraint. We then demonstrate that a matched sample for the evaluation of the Peer Health Exchange based on a variable number of controls and fine balance constraint is superior to simply using a variable-ratio match.

preprint2010arXiv

Identification, Inference and Sensitivity Analysis for Causal Mediation Effects

Causal mediation analysis is routinely conducted by applied researchers in a variety of disciplines. The goal of such an analysis is to investigate alternative causal mechanisms by examining the roles of intermediate variables that lie in the causal paths between the treatment and outcome variables. In this paper we first prove that under a particular version of sequential ignorability assumption, the average causal mediation effect (ACME) is nonparametrically identified. We compare our identification assumption with those proposed in the literature. Some practical implications of our identification result are also discussed. In particular, the popular estimator based on the linear structural equation model (LSEM) can be interpreted as an ACME estimator once additional parametric assumptions are made. We show that these assumptions can easily be relaxed within and outside of the LSEM framework and propose simple nonparametric estimation strategies. Second, and perhaps most importantly, we propose a new sensitivity analysis that can be easily implemented by applied researchers within the LSEM framework. Like the existing identifying assumptions, the proposed sequential ignorability assumption may be too strong in many applied settings. Thus, sensitivity analysis is essential in order to examine the robustness of empirical findings to the possible existence of an unmeasured confounder. Finally, we apply the proposed methods to a randomized experiment from political psychology. We also make easy-to-use software available to implement the proposed methods.

Luke Keele

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Covariate Adjustment in Regression Discontinuity Designs

Nonparametric identification of causal effects in clustered observational studies with differential selection

Hospital Quality Risk Standardization via Approximate Balancing Weights

Comparing the Performance of Statistical Adjustment Methods By Recovering the Experimental Benchmark from the REFLUX Trial

Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs

Protocol for a Study of the Effect of Surface Mining in Central Appalachia on Adverse Birth Outcomes

Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System

The Plateau Problem in the Heteroskedastic Probit Model

Variable-Ratio Matching with Fine Balance in a Study of Peer Health Exchange

Identification, Inference and Sensitivity Analysis for Causal Mediation Effects