Source author record

Dylan S. Small

Dylan S. Small appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology math.ST Statistics Theory Computation Neurons and Cognition Populations and Evolution

Catalog footprint

What is connected

27works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Randomization Inference for Cluster-Randomized Test-Negative Designs with Application to Dengue Studies: Unbiased estimation, Partial compliance, and Stepped-wedge design

In 2019, the World Health Organization identified dengue as one of the top ten global health threats. For the control of dengue, the Applying Wolbachia to Eliminate Dengue (AWED) study group conducted a cluster-randomized trial in Yogyakarta, Indonesia, and used a novel design, called the cluster-randomized test-negative design (CR-TND). This design can yield valid statistical inference with data collected by a passive surveillance system and thus has the advantage of cost-efficiency compared to traditional cluster-randomized trials. We investigate the statistical assumptions and properties of CR-TND under a randomization inference framework, which is known to be robust and efficient for small-sample problems. We find that, when the differential healthcare-seeking behavior comparing intervention and control varies across clusters (in contrast to the setting of Dufault and Jewell, 2020 where the differential healthcare-seeking behavior is constant across clusters), current analysis methods for CR-TND can be biased and have inflated type I error. We propose the log-contrast estimator that can eliminate such bias and improve precision by adjusting for covariates. Furthermore, we extend our methods to handle partial intervention compliance and a stepped-wedge design, both of which appear frequently in cluster-randomized trials. Finally, we demonstrate our results by simulation studies and re-analysis of the AWED study.

preprint2020arXiv

A Nonparametric Likelihood Approach for Inference in Instrumental Variable Models

Instrumental variable methods allow for inference about the treatment effect by controlling for unmeasured confounding in randomized experiments with noncompliance. However, many studies do not consider the observed compliance behavior in the testing procedure, which can lead to a loss of power. In this paper, we propose a novel nonparametric likelihood approach, referred to as the binomial likelihood (BL) method, that incorporates information on compliance behavior while overcoming several limitations of previous techniques and utilizing the advantages of likelihood methods. Our proposed method produces proper estimates of the counterfactual distribution functions by maximizing the binomial likelihood over the space of distribution functions. Using this we propose two versions of a binomial likelihood ratio test for the null hypothesis of no treatment effect. We show that both versions are more powerful to detect any distributional change than existing methods in finite sample cases, and are asymptotically equivalent to the two-sample Anderson-Darling test. We also develop an efficient algorithm for computing our estimates, and apply the binomial likelihood method to a study of the effect of Medicaid coverage on mental health using the Oregon Health Insurance Experiment.

preprint2020arXiv

A Test for Differential Ascertainment in Case-Control Studies with Application to Child Maltreatment

We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. We can alleviate the effect of such bias using the estimates that our method of testing for differential ascertainment naturally provides. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.

preprint2020arXiv

Constructing a More Closely Matched Control Group in a Difference-in-Differences Analysis: Its Effect on History Interacting with Group Bias

Difference-in-differences analysis with a control group that differs considerably from a treated group is vulnerable to bias from historical events that have different effects on the groups. Constructing a more closely matched control group by matching a subset of the overall control group to the treated group may result in less bias. We study this phenomenon in simulation studies. We study the effect of mountaintop removal mining (MRM) on mortality using a difference-in-differences analysis that makes use of the increase in MRM following the 1990 Clean Air Act Amendments. For a difference-in-differences analysis of the effect of MRM on mortality, we constructed a more closely matched control group and found a 95\% confidence interval that contains substantial adverse effects along with no effect and small beneficial effects.

preprint2020arXiv

ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable

We present a comprehensive R software ivmodel for analyzing instrumental variables with one endogenous variable. The package implements a general class of estimators called k- class estimators and two confidence intervals that are fully robust to weak instruments. The package also provides power formulas for various test statistics in instrumental variables. Finally, the package contains methods for sensitivity analysis to examine the sensitivity of the inference to instrumental variables assumptions. We demonstrate the software on the data set from Card (1995), looking at the causal effect of levels of education on log earnings where the instrument is proximity to a four-year college.

preprint2020arXiv

Protocol for a Study of the Effect of Surface Mining in Central Appalachia on Adverse Birth Outcomes

Surface mining has become a major method of coal mining in Central Appalachia alongside the traditional underground mining. Concerns have been raised about the health effects of this surface mining, particularly mountaintop removal mining where coal is mined upon steep mountaintops by removing the mountaintop through clearcutting forests and explosives. We have designed a matched observational study to assess the effects of surface mining in Central Appalachia on adverse birth outcomes. This protocol describes for the study the background and motivation, the sample selection and the analysis plan.

preprint2020arXiv

Protocol for an Observational Study on the Effects of Early-Life Participation in Contact Sports on Later-Life Cognition in a Sample of Monozygotic and Dizygotic Swedish Twins Reared Together and Twins Reared Apart

A large body of work links traumatic brain injury (TBI) in adulthood to the onset of Alzheimer's disease (AD). AD is the chief cause of dementia, leading to reduced cognitive capacity and autonomy and increased mortality risk. More recently, researchers have sought to investigate whether TBI experienced in early-life may influence trajectories of cognitive dysfunction in adulthood. It has been speculated that early-life participation in collision sports may lead to poor cognitive and mental health outcomes. However, to date, the few studies to investigate this relationship have produced mixed results. We propose to extend this literature by conducting a prospective study on the effects of early-life participation in collision sports on later-life cognitive health using the Swedish Adoption/Twin Study on Aging (SATSA). The SATSA is unique in its sampling of monozygotic and dizygotic twins reared together (respectively MZT, DZT) and twins reared apart (respectively MZA, DZA). The proposed analysis is a prospective study of 660 individuals comprised of 270 twin pairs and 120 singletons. Seventy-eight (11.8% individuals reported participation in collision sports. Our primary outcome will be an indicator of cognitive impairment determined by scores on the Mini-Mental State Examination (MMSE). We will also consider several secondary cognitive outcomes including verbal and spatial ability, memory, and processing speed. Our sample will be restricted to individuals with at least one MMSE score out of seven repeated assessments spaced approximately three years apart. We will adjust for age, sex, and education in each of our models.

preprint2020arXiv

Protocol for an Observational Study on the Effects of Social Distancing on Influenza-Like Illness and COVID-19

The novel coronavirus disease (COVID-19) is a highly contagious respiratory disease that was first detected in Wuhan, China in December 2019, and has since spread around the globe, claiming more than 69,000 lives by the time this protocol is written. It has been widely acknowledged that the most effective public policy to mitigate the pandemic is \emph{social and physical distancing}: keeping at least six feet away from people, working from home, closing non-essential businesses, etc. There have been a lot of anecdotal evidences suggesting that social distancing has a causal effect on disease mitigation; however, few studies have investigated the effect of social distancing on disease mitigation in a transparent and statistically-sound manner. We propose to perform an optimal non-bipartite matching to pair counties with similar observed covariates but vastly different average social distancing scores during the first week (March 16th through Match 22nd) of President's \emph{15 Days to Slow the Spread} campaign. We have produced a total of $302$ pairs of two U.S. counties with good covariate balance on a total of $16$ important variables. Our primary outcome will be the average observed illness collected by Kinsa Inc. two weeks after the intervention period. Although the observed illness does not directly measure COVID-19, it reflects a real-time aspect of the pandemic, and unlike confirmed cases, it is much less confounded by counties' testing capabilities. We also consider observed illness three weeks after the intervention period as a secondary outcome. We will test a proportional treatment effect using a randomization-based test with covariance adjustment and conduct a sensitivity analysis.

preprint2020arXiv

Two Robust Tools for Inference about Causal Effects with Invalid Instruments

Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. However, in practice, some of the putative instrumental variables are likely to be invalid. This paper presents two tools to conduct valid inference and tests in the presence of invalid instruments. First, we propose a simple and general approach to construct confidence intervals based on taking unions of well-known confidence intervals. Second, we propose a novel test for the null causal effect based on a collider bias. Our two proposals, especially when fused together, outperform traditional instrumental variable confidence intervals when invalid instruments are present, and can also be used as a sensitivity analysis when there is concern that instrumental variables assumptions are violated. The new approach is applied to a Mendelian randomization study on the causal effect of low-density lipoprotein on the incidence of cardiovascular diseases.

preprint2016arXiv

A Point-process Response Model for Spike Trains from Single Neurons in Neural Circuits under Optogenetic Stimulation

Optogenetics is a new tool to study neuronal circuits that have been genetically modified to allow stimulation by flashes of light. We study recordings from single neurons within neural circuits under optogenetic stimulation. The data from these experiments present a statistical challenge of modeling a high frequency point process (neuronal spikes) while the input is another high frequency point process (light flashes). We further develop a generalized linear model approach to model the relationships between two point processes, employing additive point-process response functions. The resulting model, Point-process Responses for Optogenetics (PRO), provides explicit nonlinear transformations to link the input point process with the output one. Such response functions may provide important and interpretable scientific insights into the properties of the biophysical process that governs neural spiking in response to optogenetic stimulation. We validate and compare the PRO model using a real dataset and simulations, and our model yields a superior area-under-the- curve value as high as 93% for predicting every future spike. For our experiment on the recurrent layer V circuit in the prefrontal cortex, the PRO model provides evidence that neurons integrate their inputs in a sophisticated manner. Another use of the model is that it enables understanding how neural circuits are altered under various disease conditions and/or experimental conditions by comparing the PRO parameters.

preprint2016arXiv

A simple and robust confidence interval for causal effects with possibly invalid instruments

Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. However, in practice, some of the putative instrumental variables are likely to be invalid. This paper presents a simple and general approach to construct a confidence interval that is robust to possibly invalid instruments. The robust confidence interval has theoretical guarantees on having the correct coverage and can also be used to assess the sensitivity of inference when instrumental variables assumptions are violated. The paper also shows that the robust confidence interval outperforms traditional confidence intervals popular in instrumental variables literature when invalid instruments are present. The new approach is applied to a developmental economics study of the causal effect of income on food expenditures.

preprint2016arXiv

Estimating the Malaria Attributable Fever Fraction Accounting for Parasites Being Killed by Fever and Measurement Error

Malaria is a parasitic disease that is a major health problem in many tropical regions. The most characteristic symptom of malaria is fever. The fraction of fevers that are attributable to malaria, the malaria attributable fever fraction (MAFF), is an important public health measure for assessing the effect of malaria control programs and other purposes. Estimating the MAFF is not straightforward because there is no gold standard diagnosis of a malaria attributable fever; an individual can have malaria parasites in her blood and a fever, but the individual may have developed partial immunity that allows her to tolerate the parasites and the fever is being caused by another infection. We define the MAFF using the potential outcome framework for causal inference and show what assumptions underlie current estimation methods. Current estimation methods rely on an assumption that the parasite density is correctly measured. However, this assumption does not generally hold because (i) fever kills some parasites and (ii) the measurement of parasite density has measurement error. In the presence of these problems, we show current estimation methods do not perform well. We propose a novel maximum likelihood estimation method based on exponential family g-modeling. Under the assumption that the measurement error mechanism and the magnitude of the fever killing effect are known, we show that our proposed method provides approximately unbiased estimates of the MAFF in simulation studies. A sensitivity analysis can be used to assess the impact of different magnitudes of fever killing and different measurement error mechanisms. We apply our proposed method to estimate the MAFF in Kilombero, Tanzania.

preprint2016arXiv

Mediation Analysis for Count and Zero-Inflated Count Data without Sequential Ignorability and Its Application in Dental Studies

Mediation analysis seeks to understand the mechanism by which a treatment affects an outcome. Count or zero-inflated count outcome are common in many studies in which mediation analysis is of interest. For example, in dental studies, outcomes such as decayed, missing and filled teeth are typically zero inflated. Existing mediation analysis approaches for count data assume sequential ignorability of the mediator. This is often not plausible because the mediator is not randomized so that there are unmeasured confounders associated with the mediator and the outcome. In this paper, we develop causal methods based on instrumental variable (IV) approaches for mediation analysis for count data possibly with a lot of zeros that do not require the assumption of sequential ignorability. We first define the direct and indirect effect ratios for those data, and then propose estimating equations and use empirical likelihood to estimate the direct and indirect effects consistently. A sensitivity analysis is proposed for violations of the IV exclusion restriction assumption. Simulation studies demonstrate that our method works well for different types of outcomes under different settings. Our method is applied to a randomized dental caries prevention trial and a study of the effect of a massive flood in Bangladesh on children's diarrhea.

preprint2016arXiv

Protocol for an Observational Study on the Effects of Playing High School Football on Later Life Cognitive Functioning and Mental Health

A potential causal relationship between head injuries sustained by NFL players and later-life neurological decline may have broad implications for participants in youth and high school football programs. However, brain trauma risk at the professional level may be different than that at the youth and high school levels and the long-term effects of participation at these levels is as-yet unclear. To investigate the effect of playing high school football on later life depression and cognitive functioning, we propose a retrospective observational study using data from the Wisconsin Longitudinal Study (WLS) of graduates from Wisconsin high schools in 1957. We compare 1,153 high school males who played varsity football to 2,751 male students who did not. 1,951 of the control subjects did not play any sport and the remaining 800 controls played a non-contact sport. We focus on two primary outcomes measured at age 65: a composite cognitive outcome measuring verbal fluency and memory and the modified CES-D depression score. To control for potential confounders we adjust for pre-exposure covariates such as IQ with matching and model-based covariate adjustment. We will conduct an ordered testing procedure that uses all 2,751 controls while controlling for possible unmeasured differences between students who played sports and those who did not. We will quantitatively assess the sensitivity of the results to potential unmeasured confounding. The study will also consider several secondary outcomes of clinical interest such as aggression and heavy drinking. The rich set of pre-exposure variables, relatively unbiased sampling, and longitudinal nature of the WLS dataset make the proposed analysis unique among related studies that rely primarily on convenience samples of football players with reported neurological symptoms.

preprint2016arXiv

Using an Instrumental Variable to Test for Unmeasured Confounding

An important concern in an observational study is whether or not there is unmeasured confounding, i.e., unmeasured ways in which the treatment and control groups differ before treatment that affect the outcome. We develop a test of whether there is unmeasured confounding when an instrumental variable (IV) is available. An IV is a variable that is independent of the unmeasured confounding and encourages a subject to take one treatment level vs. another, while having no effect on the outcome beyond its encouragement of a certain treatment level. We show what types of unmeasured confounding can be tested for with an IV and develop a test for this type of unmeasured confounding that has correct type I error rate. We show that the widely used Durbin-Wu-Hausman (DWH) test can have inflated type I error rates when there is treatment effect heterogeneity. Additionally, we show that our test provides more insight into the nature of the unmeasured confounding than the DWH test. We apply our test to an observational study of the effect of a premature infant being delivered in a high-level neonatal intensive care unit (one with mechanical assisted ventilation and high volume) vs. a lower level unit, using the excess travel time a mother lives from the nearest high-level unit to the nearest lower-level unit as an IV.

preprint2015arXiv

Discrete Optimization for Interpretable Study Populations and Randomization Inference in an Observational Study of Severe Sepsis Mortality

Motivated by an observational study of the effect of hospital ward versus intensive care unit admission on severe sepsis mortality, we develop methods to address two common problems in observational studies: (1) when there is a lack of covariate overlap between the treated and control groups, how to define an interpretable study population wherein inference can be conducted without extrapolating with respect to important variables; and (2) how to use randomization inference to form confidence intervals for the average treatment effect with binary outcomes. Our solution to problem (1) incorporates existing suggestions in the literature while yielding a study population that is easily understood in terms of the covariates themselves, and can be solved using an efficient branch-and-bound algorithm. We address problem (2) by solving a linear integer program to utilize the worst case variance of the average treatment effect among values for unobserved potential outcomes that are compatible with the null hypothesis. Our analysis finds no evidence for a difference between the sixty day mortality rates if all individuals were admitted to the ICU and if all patients were admitted to the hospital ward among less severely ill patients and among patients with cryptic septic shock. We implement our methodology in R, providing scripts in the supplementary material.

preprint2015arXiv

Equivalence testing for functional data with an application to comparing pulmonary function devices

Equivalence testing for scalar data has been well addressed in the literature, however, the same cannot be said for functional data. The resultant complexity from maintaining the functional structure of the data, rather than using a scalar transformation to reduce dimensionality, renders the existing literature on equivalence testing inadequate for the desired inference. We propose a framework for equivalence testing for functional data within both the frequentist and Bayesian paradigms. This framework combines extensions of scalar methodologies with new methodology for functional data. Our frequentist hypothesis test extends the Two One-Sided Testing (TOST) procedure for equivalence testing to the functional regime. We conduct this TOST procedure through the use of the nonparametric bootstrap. Our Bayesian methodology employs a functional analysis of variance model, and uses a flexible class of Gaussian Processes for both modeling our data and as prior distributions. Through our analysis, we introduce a model for heteroscedastic variances within a Gaussian Process by modeling variance curves via Log-Gaussian Process priors. We stress the importance of choosing prior distributions that are commensurate with the prior state of knowledge and evidence regarding practical equivalence. We illustrate these testing methods through data from an ongoing method comparison study between two devices for pulmonary function testing. In so doing, we provide not only concrete motivation for equivalence testing for functional data, but also a blueprint for researchers who hope to conduct similar inference.

preprint2015arXiv

Full Matching Approach to Instrumental Variables Estimation with Application to the Effect of Malaria on Stunting

Most previous studies of the causal relationship between malaria and stunting have been studies where potential confounders are controlled via regression-based methods, but these studies may have been biased by unobserved confounders. Instrumental variables (IV) regression offers a way to control for unmeasured confounders where, in our case, the sickle cell trait can be used as an instrument. However, for the instrument to be valid, it may still be important to account for measured confounders. The most commonly used instrumental variable regression method, two-stage least squares, relies on parametric assumptions on the effects of measured confounders to account for them. Additionally, two-stage least squares lacks transparency with respect to covariate balance and weighing of subjects and does not blind the researcher to the outcome data. To address these drawbacks, we propose an alternative method for IV estimation based on full matching. We evaluate our new procedure on simulated data and real data concerning the causal effect of malaria on stunting among children. We estimate that the risk of stunting among children with the sickle cell trait decrease by 0.22 times the average number of malaria episodes prevented by the sickle cell trait, a substantial effect of malaria on stunting (p-value: 0.011, 95% CI: 0.044, 1).

preprint2015arXiv

Isolation in the construction of natural experiments

A natural experiment is a type of observational study in which treatment assignment, though not randomized by the investigator, is plausibly close to random. A process that assigns treatments in a highly nonrandom, inequitable manner may, in rare and brief moments, assign aspects of treatments at random or nearly so. Isolating those moments and aspects may extract a natural experiment from a setting in which treatment assignment is otherwise quite biased, far from random. Isolation is a tool that focuses on those rare, brief instances, extracting a small natural experiment from otherwise useless data. We discuss the theory behind isolation and illustrate its use in a reanalysis of a well-known study of the effects of fertility on workforce participation. Whether a woman becomes pregnant at a certain moment in her life and whether she brings that pregnancy to term may reflect her aspirations for family, education and career, the degree of control she exerts over her fertility, and the quality of her relationship with the father; moreover, these aspirations and relationships are unlikely to be recorded with precision in surveys and censuses, and they may confound studies of workforce participation. However, given that a women is pregnant and will bring the pregnancy to term, whether she will have twins or a single child is, to a large extent, simply luck. Given that a woman is pregnant at a certain moment, the differential comparison of two types of pregnancies on workforce participation, twins or a single child, may be close to randomized, not biased by unmeasured aspirations. In this comparison, we find in our case study that mothers of twins had more children but only slightly reduced workforce participation, approximately 5% less time at work for an additional child.

preprint2015arXiv

Sensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming

A sensitivity analysis in an observational study assesses the robustness of significant findings to unmeasured confounding. While sensitivity analyses in matched observational studies have been well addressed when there is a single outcome variable, accounting for multiple comparisons through the existing methods yields overly conservative results when there are multiple outcome variables of interest. This stems from the fact that unmeasured confounding cannot affect the probability of assignment to treatment differently depending on the outcome being analyzed. Existing methods allow this to occur by combining the results of individual sensitivity analyses to assess whether at least one hypothesis is significant, which in turn results in an overly pessimistic assessment of a study's sensitivity to unobserved biases. By solving a quadratically constrained linear program, we are able to perform a sensitivity analysis while enforcing that unmeasured confounding must have the same impact on the treatment assignment probabilities across outcomes for each individual in the study. We show that this allows for uniform improvements in the power of a sensitivity analysis not only for testing the overall null of no effect, but also for null hypotheses on \textit{specific} outcome variables while strongly controlling the familywise error rate. We illustrate our method through an observational study on the effect of smoking on naphthalene exposure.

preprint2014arXiv

Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants

Understanding how effective high-level NICUs (neonatal intensive care units that have the capacity for sustained mechanical assisted ventilation and high volume) are compared to low-level NICUs is important and valuable for both individual mothers and for public policy decisions. The goal of this paper is to estimate the effect on mortality of premature babies being delivered in a high-level NICU vs. a low-level NICU through an observational study where there are unmeasured confounders as well as nonignorable missing covariates. We consider the use of excess travel time as an instrumental variable (IV) to control for unmeasured confounders. In order for an IV to be valid, we must condition on confounders of the IV---outcome relationship, for example, month prenatal care started must be conditioned on for excess travel time to be a valid IV. However, sometimes month prenatal care started is missing, and the missingness may be nonignorable because it is related to the not fully measured mother's/infant's risk of complications. We develop a method to estimate the causal effect of a treatment using an IV when there are nonignorable missing covariates as in our data, where we allow the missingness to depend on the fully observed outcome as well as the partially observed compliance class, which is a proxy for the unmeasured risk of complications. A simulation study shows that under our nonignorable missingness assumption, the commonly used estimation methods, complete-case analysis and multiple imputation by chained equations assuming missingness at random, provide biased estimates, while our method provides approximately unbiased estimates. We apply our method to the NICU study and find evidence that high-level NICUs significantly reduce deaths for babies of small gestational age, whereas for almost mature babies like 37 weeks, the level of NICUs makes little difference. A sensitivity analysis is conducted to assess the sensitivity of our conclusions to key assumptions about the missing covariates. The method we develop in this paper may be useful for many observational studies facing similar issues of unmeasured confounders and nonignorable missing data as ours.

preprint2014arXiv

Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization

Instrumental variables have been widely used for estimating the causal effect between exposure and outcome. Conventional estimation methods require complete knowledge about all the instruments' validity; a valid instrument must not have a direct effect on the outcome and not be related to unmeasured confounders. Often, this is impractical as highlighted by Mendelian randomization studies where genetic markers are used as instruments and complete knowledge about instruments' validity is equivalent to complete knowledge about the involved genes' functions. In this paper, we propose a method for estimation of causal effects when this complete knowledge is absent. It is shown that causal effects are identified and can be estimated as long as less than $50$% of instruments are invalid, without knowing which of the instruments are invalid. We also introduce conditions for identification when the 50% threshold is violated. A fast penalized $\ell_1$ estimation method, called sisVIVE, is introduced for estimating the causal effect without knowing which instruments are valid, with theoretical guarantees on its performance. The proposed method is demonstrated on simulated data and a real Mendelian randomization study concerning the effect of body mass index on health-related quality of life index. An R package \emph{sisVIVE} is available online.

preprint2013arXiv

Stronger instruments via integer programming in an observational study of late preterm birth outcomes

In an optimal nonbipartite match, a single population is divided into matched pairs to minimize a total distance within matched pairs. Nonbipartite matching has been used to strengthen instrumental variables in observational studies of treatment effects, essentially by forming pairs that are similar in terms of covariates but very different in the strength of encouragement to accept the treatment. Optimal nonbipartite matching is typically done using network optimization techniques that can be quick, running in polynomial time, but these techniques limit the tools available for matching. Instead, we use integer programming techniques, thereby obtaining a wealth of new tools not previously available for nonbipartite matching, including fine and near-fine balance for several nominal variables, forced near balance on means and optimal subsetting. We illustrate the methods in our on-going study of outcomes of late-preterm births in California, that is, births of 34 to 36 weeks of gestation. Would lengthening the time in the hospital for such births reduce the frequency of rapid readmissions? A straightforward comparison of babies who stay for a shorter or longer time would be severely biased, because the principal reason for a long stay is some serious health problem. We need an instrument, something inconsequential and haphazard that encourages a shorter or a longer stay in the hospital. It turns out that babies born at certain times of day tend to stay overnight once with a shorter length of stay, whereas babies born at other times of day tend to stay overnight twice with a longer length of stay, and there is nothing particularly special about a baby who is born at 11:00 pm.

preprint2012arXiv

Mediation Analysis Without Sequential Ignorability: Using Baseline Covariates Interacted with Random Assignment as Instrumental Variables

In randomized trials, researchers are often interested in mediation analysis to understand how a treatment works, in particular how much of a treatment's effect is mediated by an intermediated variable and how much the treatment directly affects the outcome not through the mediator. The standard regression approach to mediation analysis assumes sequential ignorability of the mediator, that is that the mediator is effectively randomly assigned given baseline covariates and the randomized treatment. Since the experiment does not randomize the mediator, sequential ignorability is often not plausible. Ten Have et al. (2007, Biometrics), Dunn and Bentall (2007, Statistics in Medicine) and Albert (2008, Statistics in Medicine) presented methods that use baseline covariates interacted with random assignment as instrumental variables, and do not require sequential ignorability. We make two contributions to this approach. First, in previous work on the instrumental variable approach, it has been assumed that the direct effect of treatment and the effect of the mediator are constant across subjects; we allow for variation in effects across subjects and show what assumptions are needed to obtain consistent estimates for this setting. Second, we develop a method of sensitivity analysis for violations of the key assumption that the direct effect of the treatment and the effect of the mediator do not depend on the baseline covariates.

preprint2011arXiv

Causal inference for continuous-time processes when covariates are observed only at discrete times

Most of the work on the structural nested model and g-estimation for causal inference in longitudinal data assumes a discrete-time underlying data generating process. However, in some observational studies, it is more reasonable to assume that the data are generated from a continuous-time process and are only observable at discrete time points. When these circumstances arise, the sequential randomization assumption in the observed discrete-time data, which is essential in justifying discrete-time g-estimation, may not be reasonable. Under a deterministic model, we discuss other useful assumptions that guarantee the consistency of discrete-time g-estimation. In more general cases, when those assumptions are violated, we propose a controlling-the-future method that performs at least as well as g-estimation in most scenarios and which provides consistent estimation in some cases where g-estimation is severely inconsistent. We apply the methods discussed in this paper to simulated data, as well as to a data set collected following a massive flood in Bangladesh, estimating the effect of diarrhea on children's height. Results from different methods are compared in both simulation and the real application.

preprint2011arXiv

The effect of winning an Oscar Award on survival: Correcting for healthy performer survivor bias with a rank preserving structural accelerated failure time model

We study the causal effect of winning an Oscar Award on an actor or actress's survival. Does the increase in social rank from a performer winning an Oscar increase the performer's life expectancy? Previous studies of this issue have suffered from healthy performer survivor bias, that is, candidates who are healthier will be able to act in more films and have more chance to win Oscar Awards. To correct this bias, we adapt Robins' rank preserving structural accelerated failure time model and $g$-estimation method. We show in simulation studies that this approach corrects the bias contained in previous studies. We estimate that the effect of winning an Oscar Award on survival is 4.2 years, with a 95% confidence interval of $[-0.4,8.4]$ years. There is not strong evidence that winning an Oscar increases life expectancy.

preprint2010arXiv

Hidden Markov models for alcoholism treatment trial data

In a clinical trial of a treatment for alcoholism, a common response variable of interest is the number of alcoholic drinks consumed by each subject each day, or an ordinal version of this response, with levels corresponding to abstinence, light drinking and heavy drinking. In these trials, within-subject drinking patterns are often characterized by alternating periods of heavy drinking and abstinence. For this reason, many statistical models for time series that assume steady behavior over time and white noise errors do not fit alcohol data well. In this paper we propose to describe subjects' drinking behavior using Markov models and hidden Markov models (HMMs), which are better suited to describe processes that make sudden, rather than gradual, changes over time. We incorporate random effects into these models using a hierarchical Bayes structure to account for correlated responses within subjects over time, and we estimate the effects of covariates, including a randomized treatment, on the outcome in a novel way. We illustrate the models by fitting them to a large data set from a clinical trial of the drug Naltrexone. The HMM, in particular, fits this data well and also contains unique features that allow for useful clinical interpretations of alcohol consumption behavior.

Dylan S. Small

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Randomization Inference for Cluster-Randomized Test-Negative Designs with Application to Dengue Studies: Unbiased estimation, Partial compliance, and Stepped-wedge design

A Nonparametric Likelihood Approach for Inference in Instrumental Variable Models

A Test for Differential Ascertainment in Case-Control Studies with Application to Child Maltreatment

Constructing a More Closely Matched Control Group in a Difference-in-Differences Analysis: Its Effect on History Interacting with Group Bias

ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable

Protocol for a Study of the Effect of Surface Mining in Central Appalachia on Adverse Birth Outcomes

Protocol for an Observational Study on the Effects of Early-Life Participation in Contact Sports on Later-Life Cognition in a Sample of Monozygotic and Dizygotic Swedish Twins Reared Together and Twins Reared Apart

Protocol for an Observational Study on the Effects of Social Distancing on Influenza-Like Illness and COVID-19

Two Robust Tools for Inference about Causal Effects with Invalid Instruments

A Point-process Response Model for Spike Trains from Single Neurons in Neural Circuits under Optogenetic Stimulation

A simple and robust confidence interval for causal effects with possibly invalid instruments

Estimating the Malaria Attributable Fever Fraction Accounting for Parasites Being Killed by Fever and Measurement Error

Mediation Analysis for Count and Zero-Inflated Count Data without Sequential Ignorability and Its Application in Dental Studies

Protocol for an Observational Study on the Effects of Playing High School Football on Later Life Cognitive Functioning and Mental Health

Using an Instrumental Variable to Test for Unmeasured Confounding

Discrete Optimization for Interpretable Study Populations and Randomization Inference in an Observational Study of Severe Sepsis Mortality

Equivalence testing for functional data with an application to comparing pulmonary function devices

Full Matching Approach to Instrumental Variables Estimation with Application to the Effect of Malaria on Stunting

Isolation in the construction of natural experiments

Sensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming

Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants

Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization

Stronger instruments via integer programming in an observational study of late preterm birth outcomes

Mediation Analysis Without Sequential Ignorability: Using Baseline Covariates Interacted with Random Assignment as Instrumental Variables

Causal inference for continuous-time processes when covariates are observed only at discrete times

The effect of winning an Oscar Award on survival: Correcting for healthy performer survivor bias with a rank preserving structural accelerated failure time model

Hidden Markov models for alcoholism treatment trial data