Source author record

Michael Baiocchi

Michael Baiocchi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology Computation math.ST physics.ed-ph Statistics Theory

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

stratamatch: Prognostic ScoreStratification using a Pilot Design

Optimal propensity score matching has emerged as one of the most ubiquitous approaches for causal inference studies on observational data; However, outstanding critiques of the statistical properties of propensity score matching have cast doubt on the statistical efficiency of this technique, and the poor scalability of optimal matching to large data sets makes this approach inconvenient if not infeasible for sample sizes that are increasingly commonplace in modern observational data. The stratamatch package provides implementation support and diagnostics for `stratified matching designs,' an approach which addresses both of these issues with optimal propensity score matching for large-sample observational studies. First, stratifying the data enables more computationally efficient matching of large data sets. Second, stratamatch implements a `pilot design' approach in order to stratify by a prognostic score, which may increase the precision of the effect estimate and increase power in sensitivity analyses of unmeasured confounding.

preprint2020arXiv

A Pilot Design for Observational Studies: Using Abundant Data Thoughtfully

Observational studies often benefit from an abundance of observational units. This can lead to studies that -- while challenged by issues of internal validity -- have inferences derived from sample sizes substantially larger than randomized controlled trials. But is the information provided by an observational unit best used in the analysis phase? We propose the use of `pilot design,' in which observations are expended in the design phase of the study, and the post-treatment information from these observations is used to improve study design. In modern observational studies, which are data rich but control poor, pilot designs can be used to gain information about the structure of post-treatment variation. This information can then be used to improve instrumental variable designs, propensity score matching, doubly-robust estimation, and other observational study designs. We illustrate one version of a pilot design, which aims to reduce within-set heterogeneity and improve performance in sensitivity analyses. This version of a pilot design expends observational units during the design phase to fit a prognostic model, avoiding concerns of overfitting. Additionally, it enables the construction of `Assignment-Control (AC) plots,' which visualize the relationship between propensity and prognostic scores. We first show some examples of these plots, then we demonstrate in a simulation setting how this alternative use of the observations can lead to gains in terms of both treatment effect estimation and sensitivity analyses of unobserved confounding.

preprint2020arXiv

Combining Observational and Experimental Datasets Using Shrinkage Estimators

We consider the problem of combining data from observational and experimental sources to make causal conclusions. This problem is increasingly relevant, as the modern era has yielded passive collection of massive observational datasets in areas such as e-commerce and electronic health. These data may be used to supplement experimental data, which is frequently expensive to obtain. In Rosenman et al. (2018), we considered this problem under the assumption that all confounders were measured. Here, we relax the assumption of unconfoundedness. To derive combined estimators with desirable properties, we make use of results from the Stein Shrinkage literature. Our contributions are threefold. First, we propose a generic procedure for deriving shrinkage estimators in this setting, making use of a generalized unbiased risk estimate. Second, we develop two new estimators, prove finite sample conditions under which they have lower risk than an estimator using only experimental data, and show that each achieves a notion of asymptotic optimality. Third, we draw connections between our approach and results in sensitivity analysis, including proposing a method for evaluating the feasibility of our estimators.

preprint2020arXiv

Understanding the spatial burden of gender-based violence: Modelling patterns of violence in Nairobi, Kenya through geospatial information

We present statistical techniques for analyzing global positioning system (GPS) data in order to understand, communicate about, and prevent patterns of violence. In this pilot study, participants in Nairobi, Kenya were asked to rate their safety at several locations, with the goal of predicting safety and learning important patterns. These approaches are meant to help articulate differences in experiences, fostering a discussion that will help communities identify issues and policymakers develop safer communities. A generalized linear mixed model incorporating spatial information taken from existing maps of Kibera showed significant predictors of perceived lack of safety included being alone and time of day; in debrief interviews, participants described feeling unsafe in spaces with hiding places, disease carrying animals, and dangerous individuals. This pilot study demonstrates promise for detecting spatial patterns of violence, which appear to be confirmed by actual rates of measured violence at schools. Several factors relevant to community building consistently predict perceived safety and emerge in participants' qualitative descriptions, telling a cohesive story about perceived safety and empowering communication to community stakeholders.

preprint2016arXiv

Protocol for an Observational Study on the Effects of Playing High School Football on Later Life Cognitive Functioning and Mental Health

A potential causal relationship between head injuries sustained by NFL players and later-life neurological decline may have broad implications for participants in youth and high school football programs. However, brain trauma risk at the professional level may be different than that at the youth and high school levels and the long-term effects of participation at these levels is as-yet unclear. To investigate the effect of playing high school football on later life depression and cognitive functioning, we propose a retrospective observational study using data from the Wisconsin Longitudinal Study (WLS) of graduates from Wisconsin high schools in 1957. We compare 1,153 high school males who played varsity football to 2,751 male students who did not. 1,951 of the control subjects did not play any sport and the remaining 800 controls played a non-contact sport. We focus on two primary outcomes measured at age 65: a composite cognitive outcome measuring verbal fluency and memory and the modified CES-D depression score. To control for potential confounders we adjust for pre-exposure covariates such as IQ with matching and model-based covariate adjustment. We will conduct an ordered testing procedure that uses all 2,751 controls while controlling for possible unmeasured differences between students who played sports and those who did not. We will quantitatively assess the sensitivity of the results to potential unmeasured confounding. The study will also consider several secondary outcomes of clinical interest such as aggression and heavy drinking. The rich set of pre-exposure variables, relatively unbiased sampling, and longitudinal nature of the WLS dataset make the proposed analysis unique among related studies that rely primarily on convenience samples of football players with reported neurological symptoms.

preprint2014arXiv

Peer assessment enhances student learning

Feedback has a powerful influence on learning, but it is also expensive to provide. In large classes, it may even be impossible for instructors to provide individualized feedback. Peer assessment has received attention lately as a way of providing personalized feedback that scales to large classes. Besides these obvious benefits, some researchers have also conjectured that students learn by peer assessing, although no studies have ever conclusively demonstrated this effect. By conducting a randomized controlled trial in an introductory statistics class, we provide evidence that peer assessment causes significant gains in student achievement. The strength of our conclusions depends critically on the careful design of the experiment, which was made possible by a web-based platform that we developed. Hence, our study is also a proof of concept of the high-quality experiments that are possible with online tools.

Michael Baiocchi

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

stratamatch: Prognostic ScoreStratification using a Pilot Design

A Pilot Design for Observational Studies: Using Abundant Data Thoughtfully

Combining Observational and Experimental Datasets Using Shrinkage Estimators

Understanding the spatial burden of gender-based violence: Modelling patterns of violence in Nairobi, Kenya through geospatial information

Protocol for an Observational Study on the Effects of Playing High School Football on Later Life Cognitive Functioning and Mental Health

Peer assessment enhances student learning