Researcher profile

Samuel V. Scarpino

Samuel V. Scarpino contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Effective Resistance for Pandemics: Mobility Network Sparsification for High-Fidelity Epidemic Simulation

Network science has increasingly become central to the field of epidemiology and our ability to respond to infectious disease threats. However, many networks derived from modern datasets are not just large, but dense, with a high ratio of edges to nodes. This includes human mobility networks where most locations have a large number of links to many other locations. Simulating large-scale epidemics requires substantial computational resources and in many cases is practically infeasible. One way to reduce the computational cost of simulating epidemics on these networks is sparsification, where a representative subset of edges is selected based on some measure of their importance. We test several sparsification strategies, ranging from naive thresholding to random sampling of edges, on mobility data from the U.S. Following recent work in computer science, we find that the most accurate approach uses the effective resistances of edges, which prioritizes edges that are the only efficient way to travel between their endpoints. The resulting sparse network preserves many aspects of the behavior of an SIR model, including both global quantities, like the epidemic size, and local details of stochastic events, including the probability each node becomes infected and its distribution of arrival times. This holds even when the sparse network preserves fewer than $10\%$ of the edges of the original network. In addition to its practical utility, this method helps illuminate which links of a weighted, undirected network are most important to disease spread.

preprint2022arXiv

The role of directionality, heterogeneity and correlations in epidemic risk and spread

Most models of epidemic spread, including many designed specifically for COVID-19, implicitly assume mass-action contact patterns and undirected contact networks, meaning that the individuals most likely to spread the disease are also the most at risk to receive it from others. Here, we review results from the theory of random directed graphs which show that many important quantities, including the reproduction number and the epidemic size, depend sensitively on the joint distribution of in- and out-degrees ("risk" and "spread"), including their heterogeneity and the correlation between them. By considering joint distributions of various kinds, we elucidate why some types of heterogeneity cause a deviation from the standard Kermack-McKendrick analysis of SIR models, i.e., so-called mass-action models where contacts are homogeneous and random, and some do not. We also show that some structured SIR models informed by realistic complex contact patterns among types of individuals (age or activity) are simply mixtures of Poisson processes and tend not to deviate significantly from the simplest mass-action model. Finally, we point out some possible policy implications of this directed structure, both for contact tracing strategy and for interventions designed to prevent superspreading events. In particular, directed graphs have a forward and backward version of the classic "friendship paradox" -- forward edges tend to lead to individuals with high risk, while backward edges lead to individuals with high spread -- such that a combination of both forward and backward contact tracing is necessary to find superspreading events and prevent future cascades of infection.

preprint2020arXiv

Beyond $R_0$: Heterogeneity in secondary infections and probabilistic epidemic forecasting

The basic reproductive number -- $R_0$ -- is one of the most common and most commonly misapplied numbers in public health. Although often used to compare outbreaks and forecast pandemic risk, this single number belies the complexity that two different pathogens can exhibit, even when they have the same $R_0$. Here, we show how to predict outbreak size using estimates of the distribution of secondary infections, leveraging both its average $R_0$ and the underlying heterogeneity. To do so, we reformulate and extend a classic result from random network theory that relies on contact tracing data to simultaneously determine the first moment ($R_0$) and the higher moments (representing the heterogeneity) in the distribution of secondary infections. Further, we show the different ways in which this framework can be implemented in the data-scarce reality of emerging pathogens. Lastly, we demonstrate that without data on the heterogeneity in secondary infections for emerging infectious diseases like COVID-19, the uncertainty in outbreak size ranges dramatically. Taken together, our work highlights the critical need for contact tracing during emerging infectious disease outbreaks and the need to look beyond $R_0$ when predicting epidemic size.

preprint2020arXiv

Mobile phone data and COVID-19: Missing an opportunity?

This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic.

preprint2020arXiv

Stochasticity and heterogeneity in the transmission dynamics of SARS-CoV-2

SARS-CoV-2 causing COVID-19 disease has moved rapidly around the globe, infecting millions and killing hundreds of thousands. The basic reproduction number, which has been widely used and misused to characterize the transmissibility of the virus, hides the fact that transmission is stochastic, is dominated by a small number of individuals, and is driven by super-spreading events (SSEs). The distinct transmission features, such as high stochasticity under low prevalence, and the central role played by SSEs on transmission dynamics, should not be overlooked. Many explosive SSEs have occurred in indoor settings stoking the pandemic and shaping its spread, such as long-term care facilities, prisons, meat-packing plants, fish factories, cruise ships, family gatherings, parties and night clubs. These SSEs demonstrate the urgent need to understand routes of transmission, while posing an opportunity that outbreak can be effectively contained with targeted interventions to eliminate SSEs. Here, we describe the potential types of SSEs, how they influence transmission, and give recommendations for control of SARS-CoV-2.

preprint2019arXiv

Interacting contagions are indistinguishable from social reinforcement

From fake news to innovative technologies, many contagions spread via a process of social reinforcement, where multiple exposures are distinct from prolonged exposure to a single source. Contrarily, biological agents such as Ebola or measles are typically thought to spread as simple contagions. Here, we demonstrate that interacting simple contagions are indistinguishable from complex contagions. In the social context, our results highlight the challenge of identifying and quantifying mechanisms, such as social reinforcement, in a world where an innumerable amount of ideas, memes and behaviors interact. In the biological context, this parallel allows the use of complex contagions to effectively quantify the non-trivial interactions of infectious diseases.

preprint2014arXiv

Epidemiological consequences of an ineffective Bordetella pertussis vaccine

The recent increase in Bordetella pertussis incidence (whooping cough) presents a challenge to global health. Recent studies have called into question the effectiveness of acellular B. pertussis vaccination in reducing transmission. Here we examine the epidemiological consequences of an ineffective B. pertussis vaccine. Using a dynamic transmission model, we find that: 1) an ineffective vaccine can account for the observed increase in B. pertussis incidence; 2) asymptomatic infections can bias surveillance and upset situational awareness of B. pertussis; and 3) vaccinating individuals in close contact with infants too young to receive vaccine (so called "cocooning" unvaccinated children) may be ineffective. Our results have important implications for B. pertussis vaccination policy and paint a complicated picture for achieving herd immunity and possible B. pertussis eradication.

preprint2014arXiv

multiDimBio: An R Package for the Design, Analysis, and Visualization of Systems Biology Experiments

The past decade has witnessed a dramatic increase in the size and scope of biological and behavioral experiments. These experiments are providing an unprecedented level of detail and depth of data. However, this increase in data presents substantial statistical and graphical hurdles to overcome, namely how to distinguish signal from noise and how to visualize multidimensional results. Here we present a series of tools designed to support a research project from inception to publication. We provide implementation of dimension reduction techniques and visualizations that function well with the types of data often seen in animal behavior studies. This package is designed to be used with experimental data but can also be used for experimental design and sample justification. The goal for this project is to create a package that will evolve over time, thereby remaining relevant and reflective of current methods and techniques.

preprint2012arXiv

Estimation with Binned Data

Variables such as household income are sometimes binned, so that we only know how many households fall in each of several bins such as $0-10,000, $10,000-15,000, or $200,000+. We provide a SAS macro that estimates the mean and variance of binned data by fitting the extended generalized gamma (EGG) distribution, the power normal (PN) distribution, and a new distribution that we call the power logistic (PL). The macro also implements a "best-of-breed" estimator that chooses from among the EGG, PN, and PL estimates on the basis of likelihood and finite variance. We test the macro by estimating the mean family and household incomes of approximately 13,000 US school districts between 1970 and 2009. The estimates have negligible bias (0-2%) and a root mean squared error of just 3-6%. The estimates compare favorably with estimates obtained by fitting the Dagum, generalized beta (GB2), or logspline distributions.