Researcher profile

Dylan S. Small

Dylan S. Small contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

Randomization Inference for Cluster-Randomized Test-Negative Designs with Application to Dengue Studies: Unbiased estimation, Partial compliance, and Stepped-wedge design

In 2019, the World Health Organization identified dengue as one of the top ten global health threats. For the control of dengue, the Applying Wolbachia to Eliminate Dengue (AWED) study group conducted a cluster-randomized trial in Yogyakarta, Indonesia, and used a novel design, called the cluster-randomized test-negative design (CR-TND). This design can yield valid statistical inference with data collected by a passive surveillance system and thus has the advantage of cost-efficiency compared to traditional cluster-randomized trials. We investigate the statistical assumptions and properties of CR-TND under a randomization inference framework, which is known to be robust and efficient for small-sample problems. We find that, when the differential healthcare-seeking behavior comparing intervention and control varies across clusters (in contrast to the setting of Dufault and Jewell, 2020 where the differential healthcare-seeking behavior is constant across clusters), current analysis methods for CR-TND can be biased and have inflated type I error. We propose the log-contrast estimator that can eliminate such bias and improve precision by adjusting for covariates. Furthermore, we extend our methods to handle partial intervention compliance and a stepped-wedge design, both of which appear frequently in cluster-randomized trials. Finally, we demonstrate our results by simulation studies and re-analysis of the AWED study.

preprint2020arXiv

A Nonparametric Likelihood Approach for Inference in Instrumental Variable Models

Instrumental variable methods allow for inference about the treatment effect by controlling for unmeasured confounding in randomized experiments with noncompliance. However, many studies do not consider the observed compliance behavior in the testing procedure, which can lead to a loss of power. In this paper, we propose a novel nonparametric likelihood approach, referred to as the binomial likelihood (BL) method, that incorporates information on compliance behavior while overcoming several limitations of previous techniques and utilizing the advantages of likelihood methods. Our proposed method produces proper estimates of the counterfactual distribution functions by maximizing the binomial likelihood over the space of distribution functions. Using this we propose two versions of a binomial likelihood ratio test for the null hypothesis of no treatment effect. We show that both versions are more powerful to detect any distributional change than existing methods in finite sample cases, and are asymptotically equivalent to the two-sample Anderson-Darling test. We also develop an efficient algorithm for computing our estimates, and apply the binomial likelihood method to a study of the effect of Medicaid coverage on mental health using the Oregon Health Insurance Experiment.

preprint2020arXiv

A Test for Differential Ascertainment in Case-Control Studies with Application to Child Maltreatment

We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. We can alleviate the effect of such bias using the estimates that our method of testing for differential ascertainment naturally provides. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.

preprint2020arXiv

Constructing a More Closely Matched Control Group in a Difference-in-Differences Analysis: Its Effect on History Interacting with Group Bias

Difference-in-differences analysis with a control group that differs considerably from a treated group is vulnerable to bias from historical events that have different effects on the groups. Constructing a more closely matched control group by matching a subset of the overall control group to the treated group may result in less bias. We study this phenomenon in simulation studies. We study the effect of mountaintop removal mining (MRM) on mortality using a difference-in-differences analysis that makes use of the increase in MRM following the 1990 Clean Air Act Amendments. For a difference-in-differences analysis of the effect of MRM on mortality, we constructed a more closely matched control group and found a 95\% confidence interval that contains substantial adverse effects along with no effect and small beneficial effects.

preprint2020arXiv

ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable

We present a comprehensive R software ivmodel for analyzing instrumental variables with one endogenous variable. The package implements a general class of estimators called k- class estimators and two confidence intervals that are fully robust to weak instruments. The package also provides power formulas for various test statistics in instrumental variables. Finally, the package contains methods for sensitivity analysis to examine the sensitivity of the inference to instrumental variables assumptions. We demonstrate the software on the data set from Card (1995), looking at the causal effect of levels of education on log earnings where the instrument is proximity to a four-year college.

preprint2020arXiv

Protocol for a Study of the Effect of Surface Mining in Central Appalachia on Adverse Birth Outcomes

Surface mining has become a major method of coal mining in Central Appalachia alongside the traditional underground mining. Concerns have been raised about the health effects of this surface mining, particularly mountaintop removal mining where coal is mined upon steep mountaintops by removing the mountaintop through clearcutting forests and explosives. We have designed a matched observational study to assess the effects of surface mining in Central Appalachia on adverse birth outcomes. This protocol describes for the study the background and motivation, the sample selection and the analysis plan.

preprint2020arXiv

Protocol for an Observational Study on the Effects of Early-Life Participation in Contact Sports on Later-Life Cognition in a Sample of Monozygotic and Dizygotic Swedish Twins Reared Together and Twins Reared Apart

A large body of work links traumatic brain injury (TBI) in adulthood to the onset of Alzheimer's disease (AD). AD is the chief cause of dementia, leading to reduced cognitive capacity and autonomy and increased mortality risk. More recently, researchers have sought to investigate whether TBI experienced in early-life may influence trajectories of cognitive dysfunction in adulthood. It has been speculated that early-life participation in collision sports may lead to poor cognitive and mental health outcomes. However, to date, the few studies to investigate this relationship have produced mixed results. We propose to extend this literature by conducting a prospective study on the effects of early-life participation in collision sports on later-life cognitive health using the Swedish Adoption/Twin Study on Aging (SATSA). The SATSA is unique in its sampling of monozygotic and dizygotic twins reared together (respectively MZT, DZT) and twins reared apart (respectively MZA, DZA). The proposed analysis is a prospective study of 660 individuals comprised of 270 twin pairs and 120 singletons. Seventy-eight (11.8% individuals reported participation in collision sports. Our primary outcome will be an indicator of cognitive impairment determined by scores on the Mini-Mental State Examination (MMSE). We will also consider several secondary cognitive outcomes including verbal and spatial ability, memory, and processing speed. Our sample will be restricted to individuals with at least one MMSE score out of seven repeated assessments spaced approximately three years apart. We will adjust for age, sex, and education in each of our models.

preprint2020arXiv

Protocol for an Observational Study on the Effects of Social Distancing on Influenza-Like Illness and COVID-19

The novel coronavirus disease (COVID-19) is a highly contagious respiratory disease that was first detected in Wuhan, China in December 2019, and has since spread around the globe, claiming more than 69,000 lives by the time this protocol is written. It has been widely acknowledged that the most effective public policy to mitigate the pandemic is \emph{social and physical distancing}: keeping at least six feet away from people, working from home, closing non-essential businesses, etc. There have been a lot of anecdotal evidences suggesting that social distancing has a causal effect on disease mitigation; however, few studies have investigated the effect of social distancing on disease mitigation in a transparent and statistically-sound manner. We propose to perform an optimal non-bipartite matching to pair counties with similar observed covariates but vastly different average social distancing scores during the first week (March 16th through Match 22nd) of President's \emph{15 Days to Slow the Spread} campaign. We have produced a total of $302$ pairs of two U.S. counties with good covariate balance on a total of $16$ important variables. Our primary outcome will be the average observed illness collected by Kinsa Inc. two weeks after the intervention period. Although the observed illness does not directly measure COVID-19, it reflects a real-time aspect of the pandemic, and unlike confirmed cases, it is much less confounded by counties' testing capabilities. We also consider observed illness three weeks after the intervention period as a secondary outcome. We will test a proportional treatment effect using a randomization-based test with covariance adjustment and conduct a sensitivity analysis.

preprint2020arXiv

Two Robust Tools for Inference about Causal Effects with Invalid Instruments

Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. However, in practice, some of the putative instrumental variables are likely to be invalid. This paper presents two tools to conduct valid inference and tests in the presence of invalid instruments. First, we propose a simple and general approach to construct confidence intervals based on taking unions of well-known confidence intervals. Second, we propose a novel test for the null causal effect based on a collider bias. Our two proposals, especially when fused together, outperform traditional instrumental variable confidence intervals when invalid instruments are present, and can also be used as a sensitivity analysis when there is concern that instrumental variables assumptions are violated. The new approach is applied to a Mendelian randomization study on the causal effect of low-density lipoprotein on the incidence of cardiovascular diseases.

preprint2012arXiv

Mediation Analysis Without Sequential Ignorability: Using Baseline Covariates Interacted with Random Assignment as Instrumental Variables

In randomized trials, researchers are often interested in mediation analysis to understand how a treatment works, in particular how much of a treatment's effect is mediated by an intermediated variable and how much the treatment directly affects the outcome not through the mediator. The standard regression approach to mediation analysis assumes sequential ignorability of the mediator, that is that the mediator is effectively randomly assigned given baseline covariates and the randomized treatment. Since the experiment does not randomize the mediator, sequential ignorability is often not plausible. Ten Have et al. (2007, Biometrics), Dunn and Bentall (2007, Statistics in Medicine) and Albert (2008, Statistics in Medicine) presented methods that use baseline covariates interacted with random assignment as instrumental variables, and do not require sequential ignorability. We make two contributions to this approach. First, in previous work on the instrumental variable approach, it has been assumed that the direct effect of treatment and the effect of the mediator are constant across subjects; we allow for variation in effects across subjects and show what assumptions are needed to obtain consistent estimates for this setting. Second, we develop a method of sensitivity analysis for violations of the key assumption that the direct effect of the treatment and the effect of the mediator do not depend on the baseline covariates.

preprint2011arXiv

Causal inference for continuous-time processes when covariates are observed only at discrete times

Most of the work on the structural nested model and g-estimation for causal inference in longitudinal data assumes a discrete-time underlying data generating process. However, in some observational studies, it is more reasonable to assume that the data are generated from a continuous-time process and are only observable at discrete time points. When these circumstances arise, the sequential randomization assumption in the observed discrete-time data, which is essential in justifying discrete-time g-estimation, may not be reasonable. Under a deterministic model, we discuss other useful assumptions that guarantee the consistency of discrete-time g-estimation. In more general cases, when those assumptions are violated, we propose a controlling-the-future method that performs at least as well as g-estimation in most scenarios and which provides consistent estimation in some cases where g-estimation is severely inconsistent. We apply the methods discussed in this paper to simulated data, as well as to a data set collected following a massive flood in Bangladesh, estimating the effect of diarrhea on children's height. Results from different methods are compared in both simulation and the real application.

preprint2011arXiv

The effect of winning an Oscar Award on survival: Correcting for healthy performer survivor bias with a rank preserving structural accelerated failure time model

We study the causal effect of winning an Oscar Award on an actor or actress's survival. Does the increase in social rank from a performer winning an Oscar increase the performer's life expectancy? Previous studies of this issue have suffered from healthy performer survivor bias, that is, candidates who are healthier will be able to act in more films and have more chance to win Oscar Awards. To correct this bias, we adapt Robins' rank preserving structural accelerated failure time model and $g$-estimation method. We show in simulation studies that this approach corrects the bias contained in previous studies. We estimate that the effect of winning an Oscar Award on survival is 4.2 years, with a 95% confidence interval of $[-0.4,8.4]$ years. There is not strong evidence that winning an Oscar increases life expectancy.

preprint2010arXiv

Hidden Markov models for alcoholism treatment trial data

In a clinical trial of a treatment for alcoholism, a common response variable of interest is the number of alcoholic drinks consumed by each subject each day, or an ordinal version of this response, with levels corresponding to abstinence, light drinking and heavy drinking. In these trials, within-subject drinking patterns are often characterized by alternating periods of heavy drinking and abstinence. For this reason, many statistical models for time series that assume steady behavior over time and white noise errors do not fit alcohol data well. In this paper we propose to describe subjects' drinking behavior using Markov models and hidden Markov models (HMMs), which are better suited to describe processes that make sudden, rather than gradual, changes over time. We incorporate random effects into these models using a hierarchical Bayes structure to account for correlated responses within subjects over time, and we estimate the effects of covariates, including a randomized treatment, on the outcome in a novel way. We illustrate the models by fitting them to a large data set from a clinical trial of the drug Naltrexone. The HMM, in particular, fits this data well and also contains unique features that allow for useful clinical interpretations of alcohol consumption behavior.