Researcher profile

Jon Wakefield

Jon Wakefield contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Estimating Global and Country-Specific Excess Mortality During the COVID-19 Pandemic

Estimating the true mortality burden of COVID-19 for every country in the world is a difficult, but crucial, public health endeavor. Attributing deaths, direct or indirect, to COVID-19 is problematic. A more attainable target is the "excess deaths", the number of deaths in a particular period, relative to that expected during "normal times", and we estimate this for all countries on a monthly time scale for 2020 and 2021. The excess mortality requires two numbers, the total deaths and the expected deaths, but the former is unavailable for many countries, and so modeling is required for these countries. The expected deaths are based on historic data and we develop a model for producing expected estimates for all countries and we allow for uncertainty in the modeled expected numbers when calculating the excess. We describe the methods that were developed to produce the World Health Organization (WHO) excess death estimates. To achieve both interpretability and transparency we developed a relatively simple overdispersed Poisson count framework, within which the various data types can be modeled. We use data from countries with national monthly data to build a predictive log-linear regression model with time-varying coefficients for countries without data. For a number of countries, subnational data only are available, and we construct a multinomial model for such data, based on the assumption that the fractions of deaths in sub-regions remain approximately constant over time. Based on our modeling, the point estimate for global excess mortality, over 2020-2021, is 14.9 million, with a 95% credible interval of (13.3, 16.6) million. This leads to a point estimate of the ratio of excess deaths to reported COVID-19 deaths of 2.75, which is a huge discrepancy.

preprint2022arXiv

Smoothed Model-Assisted Small Area Estimation

In countries where population census data are limited, generating accurate subnational estimates of health and demographic indicators is challenging. Existing model-based geostatistical methods leverage covariate information and spatial smoothing to reduce the variability of estimates but often ignore survey design, while traditional small area estimation approaches may not incorporate both unit level covariate information and spatial smoothing in a design-consistent way. We propose a smoothed model-assisted estimator that accounts for survey design and leverages both unit level covariates and spatial smoothing. Under certain assumptions, this estimator is both design-consistent and model-consistent. We compare it with existing design-based and model-based estimators using real and simulated data.

preprint2022arXiv

Spatial Aggregation with Respect to a Population Distribution

Spatial aggregation with respect to a population distribution involves estimating aggregate quantities for a population based on an observation of individuals in a subpopulation. In this context, a geostatistical workflow must account for three major sources of `aggregation error': aggregation weights, fine scale variation, and finite population variation. However, common practice is to treat the unknown population distribution as a known population density and ignore empirical variability in outcomes. We improve common practice by introducing a `sampling frame model' that allows aggregation models to account for the three sources of aggregation error simply and transparently. We compare the proposed and the traditional approach using two simulation studies that mimic neonatal mortality rate (NMR) data from the 2014 Kenya Demographic and Health Survey (KDHS2014). For the traditional approach, undercoverage/overcoverage depends arbitrarily on the aggregation grid resolution, while the new approach exhibits low sensitivity. The differences between the two aggregation approaches increase as the population of an area decreases. The differences are substantial at the second administrative level and finer, but also at the first administrative level for some population quantities. We find differences between the proposed and traditional approach are consistent with those we observe in an application to NMR data from the KDHS2014.

preprint2022arXiv

The Central Role of the Identifying Assumption in Population Size Estimation

The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data problem, where the number of unobserved individuals represents the missing data. As with any missing data problem, multiple-systems estimation requires users to make an untestable identifying assumption in order to estimate the population size from the observed data. If an appropriate identifying assumption cannot be found for a data set, no estimate of the population size should be produced based on that data set, as models with different identifying assumptions can produce arbitrarily different population size estimates -- even with identical observed data fits. Approaches to multiple-systems estimation often do not explicitly specify identifying assumptions. This makes it difficult to decouple the specification of the model for the observed data from the identifying assumption and to provide justification for the identifying assumption. We present a re-framing of the multiple-systems estimation problem that leads to an approach which decouples the specification of the observed-data model from the identifying assumption, and discuss how common models fit into this framing. This approach takes advantage of existing software and facilitates various sensitivity analyses. We demonstrate our approach in a case study estimating the number of civilian casualties in the Kosovo war. Code used to produce this manuscript is available at https://github.com/aleshing/central-role-of-identifying-assumptions.

preprint2020arXiv

Bayesian Multiresolution Modeling Of Georeferenced Data

Current implementations of multiresolution methods are limited in terms of possible types of responses and approaches to inference. We provide a multiresolution approach for spatial analysis of non-Gaussian responses using latent Gaussian models and Bayesian inference via integrated nested Laplace approximation (INLA). The approach builds on `LatticeKrig', but uses a reparameterization of the model parameters that is intuitive and interpretable so that modeling and prior selection can be guided by expert knowledge about the different spatial scales at which dependence acts. The priors can be used to make inference robust and integration over model parameters allows for more accurate posterior estimates of uncertainty. The extended LatticeKrig (ELK) model is compared to a standard implementation of LatticeKrig (LK), and a standard Matérn model, and we find modest improvement in spatial oversmoothing and prediction for the ELK model for counts of secondary education completion for women in Kenya collected in the 2014 Kenya demographic health survey. Through a simulation study with Gaussian responses and a realistic mix of short and long scale dependencies, we demonstrate that the differences between the three approaches for prediction increases with distance to nearest observation.

preprint2020arXiv

Estimation of Health and Demographic Indicators with Incomplete Geographic Information

In low and middle income countries, household surveys are a valuable source of information for a range of health and demographic indicators. Increasingly, subnational estimates are required for targeting interventions and evaluating progress towards targets. In the majority of cases, stratified cluster sampling is used, with clusters corresponding to enumeration areas. The reported geographical information varies. A common procedure, to preserve confidentiality, is to give a jittered location with the true centroid of the cluster is displaced under a known algorithm. An alternative situation, which was used for older surveys in particular, is to report the geographical region within the cluster lies. In this paper, we describe a spatial hierarchical model in which we account for inaccuracies in the cluster locations. The computational algorithm we develop is fast and avoids the heavy computation of a pure MCMC approach. We illustrate by simulation the benefits of the model, over naive alternatives.

preprint2020arXiv

Small Area Estimation of Health Outcomes

Small area estimation (SAE) entails estimating characteristics of interest for domains, often geographical areas, in which there may be few or no samples available. SAE has a long history and a wide variety of methods have been suggested, from a bewildering range of philosophical standpoints. We describe design-based and model-based approaches and models that are specified at the area-level and at the unit-level, focusing on health applications and fully Bayesian spatial models. The use of auxiliary information is a key ingredient for successful inference when response data are sparse and we discuss a number of approaches that allow the inclusion of covariate data. SAE for HIV prevalence, using data collected from a Demographic Health Survey in Malawi in 2015-2016, is used to illustrate a number of techniques. The potential use of SAE techniques for outcomes related to COVID-19 is discussed.