Source author record

Andrew Gelman

Andrew Gelman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology Computation Machine Learning stat.OT math.ST physics.soc-ph Statistics Theory physics.data-an Artificial Intelligence Digital Libraries Human-Computer Interaction math.HO Populations and Evolution Quantitative Methods Social and Information Networks

Catalog footprint

What is connected

39works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multilevel Regression and Poststratification Interface: An Application to Track Community-level COVID-19 Viral Transmission

We present a novel Bayesian workflow for multilevel regression and poststratification (MRP), introducing extensions to time-varying data and granular geography and publicly available open-source computation tools, facilitating broad research adoption and reproducibility. In the absence of comprehensive or random testing throughout the COVID-19 pandemic, we have developed a proxy method for synthetic random sampling to estimate community-level viral incidence, based on viral RNA testing of asymptomatic patients who present for elective procedures within a hospital system. The approach collects routine testing data on SARS-CoV-2 exposure among outpatients and performs statistical adjustments of sample representation using MRP, a procedure that adjusts for nonrepresentativeness of the sample and yields stable small group estimates. We illustrate the MRP interface with an application to track community-level COVID-19 viral transmission in the state of Michigan.

preprint2022arXiv

Beyond Vaccination Rates: A Synthetic Random Proxy Metric of Total SARS-CoV-2 Immunity Seroprevalence in the Community

Explicit knowledge of total community-level immune seroprevalence is critical to developing policies to mitigate the social and clinical impact of SARS-CoV-2. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally-acquired immunity, which varies broadly throughout the country and world. Without broad or random sampling of the population, accurate measurement of persistent immunity post natural infection is generally unavailable. To enable tracking of both naturally-acquired and vaccine-induced immunity, we set up a synthetic random proxy based on routine hospital testing for estimating total Immunoglobulin G (IgG) prevalence in the sampled community. Our approach analyzes viral IgG testing data of asymptomatic patients who present for elective procedures within a hospital system. We apply multilevel regression and poststratification to adjust for demographic and geographic discrepancies between the sample and the community population. We then apply state-based vaccination data to categorize immune status as driven by natural infection or by vaccine. We have validated the model using verified clinical metrics of viral and symptomatic disease incidence to show the expected biological correlation of these entities with the timing, rate, and magnitude of seroprevalence. In mid-July 2021, the estimated immunity level was 74% with the administered vaccination rate of 45% in the two counties. The metric improves real-time understanding of immunity to COVID-19 as it evolves and the coordination of policy responses to the disease, toward an inexpensive and easily operational surveillance system that transcends the limits of vaccination datasets alone.

preprint2022arXiv

Delivering data differently

Human-computer interaction relies on mouse/touchpad, keyboard, and screen, but tools have recently been developed that engage sound, smell, touch, muscular resistance, voice dialogue, balance, and multiple senses at once. How might these improvements impact upon the practice of statistics and data science? People with low vision may be better able to grasp and explore data. More generally, methods developed to enable this have the potential to allow sighted people to use more senses and become better analysts. We would like to adapt some of the wide range of available computer and sensory input/output technologies to transform data science workflows. Here is a vision of what this synthesis might accomplish.

preprint2022arXiv

Pathfinder: Parallel quasi-Newton variational inference

We propose Pathfinder, a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores.

preprint2022arXiv

The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning

Recent arguments that machine learning (ML) is facing a reproducibility and replication crisis suggest that some published claims in ML research cannot be taken at face value. These concerns inspire analogies to the replication crisis affecting the social and medical sciences. They also inspire calls for the integration of statistical approaches to causal inference and predictive modeling. A deeper understanding of what reproducibility concerns in supervised ML research have in common with the replication crisis in experimental science puts the new concerns in perspective, and helps researchers avoid "the worst of both worlds," where ML researchers begin borrowing methodologies from explanatory modeling without understanding their limitations and vice versa. We contribute a comparative analysis of concerns about inductive learning that arise in causal attribution as exemplified in psychology versus predictive modeling as exemplified in ML. We identify themes that re-occur in reform discussions, like overreliance on asymptotic theory and non-credible beliefs about real-world data generating processes. We argue that in both fields, claims from learning are implied to generalize outside the specific environment studied (e.g., the input dataset or subject sample, modeling implementation, etc.) but are often impossible to refute due to undisclosed sources of variance in the learning pipeline. In particular, errors being acknowledged in ML expose cracks in long-held beliefs that optimizing predictive accuracy using huge datasets absolves one from having to consider a true data generating process or formally represent uncertainty in performance claims. We conclude by discussing risks that arise when sources of errors are misdiagnosed and the need to acknowledge the role of human inductive biases in learning and reform.

preprint2022arXiv

Using sex and gender in survey adjustment

Accounting for sex and gender is a challenge in social science research. While other methodology papers consider issues surrounding appropriate measurement, we consider the problem of adjustment for survey nonresponse and generalization from samples to populations in the context of the recent push toward measuring sex or gender as a non-binary construct. This is challenging not only in that response categories differ between sex and gender measurement, but also in that both these attributes are potentially multidimensional. We reflect on similarities to measuring race/ethnicity before considering the ethical and statistical implications of the options available to us. We present a simulation study to understand the statistical implications under a variety of scenarios, and demonstrate the application of the decision process with the New York City Poverty Tracker. Overall, we conclude not with a single best recommendation for all surveys but rather with an awareness of the complexity of the problem and the benefits and weaknesses of different approaches.

preprint2021arXiv

Making the most of imprecise measurements: Changing patterns of arsenic concentrations in shallow wells of Bangladesh from laboratory and field data

Millions of people in Bangladesh drink well water contaminated with arsenic. Despite the severity of this heath crisis, little is known about the extent to which groundwater arsenic concentrations change over time: Are concentrations generally rising, or is arsenic being flushed out of aquifers? Are spatially patterns of high and low concentrations across wells homogenizing over time, or are these spatial gradients becoming more pronounced? To address these questions, we analyze a large set of arsenic concentrations that were sampled within a 25 km$^2$ area of Bangladesh over time. We compare two blanket survey collected in 2000/2001 and 2012/2013 from the same villages but relying on a largely different set of wells. The early set consists of 4574 accurate laboratory measurements, but the later set poses a challenge for analysis because it is composed of 8229 less accurate categorical measurements conducted in the field with a kit. We construct a Bayesian model that jointly calibrates the measurement errors, applies spatial smoothing, and describes the spatiotemporal dynamic with a diffusion-like process model. Our statistical analysis reveals that arsenic concentrations change over time and that their mean dropped from 110 to 96 $μ$g/L over 12 years, although one quarter of individual wells are inferred to see an increase. The largest decreases occurred at the wells with locally high concentrations where the estimated Laplacian indicated that the arsenic surface was strongly concave. However, well with initially low concentrations were unlikely to be contaminated by nearby high concentration wells over a decade. We validate the model using a posterior predictive check on an external subset of laboratory measurements from the same 271 wells in the same study area available for 2000, 2014, and 2015.

preprint2020arXiv

Accounting for Uncertainty During a Pandemic

We discuss several issues of statistical design, data collection, analysis, communication, and decision making that have arisen in recent and ongoing coronavirus studies, focusing on tools for assessment and propagation of uncertainty. This paper does not purport to be a comprehensive survey of the research literature; rather, we use examples to illustrate statistical points that we think are important.

preprint2020arXiv

Adaptive Path Sampling in Metastable Posterior Distributions

The normalizing constant plays an important role in Bayesian computation, and there is a large literature on methods for computing or approximating normalizing constants that cannot be evaluated in closed form. When the normalizing constant varies by orders of magnitude, methods based on importance sampling can require many rounds of tuning. We present an improved approach using adaptive path sampling, iteratively reducing gaps between the base and target. Using this adaptive strategy, we develop two metastable sampling schemes. They are automated in Stan and require little tuning. For a multimodal posterior density, we equip simulated tempering with a continuous temperature. For a funnel-shaped entropic barrier, we adaptively increase mass in bottleneck regions to form an implicit divide-and-conquer. Both approaches empirically perform better than existing methods for sampling from metastable distributions, including higher accuracy and computation efficiency.

preprint2020arXiv

Bayesian aggregation of average data: An application in drug development

Throughout the different phases of a drug development program, randomized trials are used to establish the tolerability, safety, and efficacy of a candidate drug. At each stage one aims to optimize the design of future studies by extrapolation from the available evidence at the time. This includes collected trial data and relevant external data. However, relevant external data are typically available as averages only, for example from trials on alternative treatments reported in the literature. Here we report on such an example from a drug development for wet age-related macular degeneration. This disease is the leading cause of severe vision loss in the elderly. While current treatment options are efficacious, they are also a substantial burden for the patient. Hence, new treatments are under development which need to be compared against existing treatments. The general statistical problem this leads to is meta-analysis, which addresses the question of how we can combine datasets collected under different conditions. Bayesian methods have long been used to achieve partial pooling. Here we consider the challenge when the model of interest is complex (hierarchical and nonlinear) and one dataset is given as raw data while the second dataset is given as averages only. In such a situation, common meta-analytic methods can only be applied when the model is sufficiently simple for analytic approaches. When the model is too complex, for example nonlinear, an analytic approach is not possible. We provide a Bayesian solution by using simulation to approximately reconstruct the likelihood of the external summary and allowing the parameters in the model to vary under the different conditions. We first evaluate our approach using fake-data simulations and then report results for the drug development program that motivated this research.

preprint2020arXiv

Bayesian hierarchical weighting adjustment and survey inference

We combine Bayesian prediction and weighted inference as a unified approach to survey inference. The general principles of Bayesian analysis imply that models for survey outcomes should be conditional on all variables that affect the probability of inclusion. We incorporate the weighting variables under the framework of multilevel regression and poststratification, as a byproduct generating model-based weights after smoothing. We investigate deep interactions and introduce structured prior distributions for smoothing and stability of estimates. The computation is done via Stan and implemented in the open source R package "rstanarm" ready for public use. Simulation studies illustrate that model-based prediction and weighting inference outperform classical weighting. We apply the proposal to the New York Longitudinal Study of Wellbeing. The new approach generates robust weights and increases efficiency for finite population inference, especially for subsets of the population.

preprint2020arXiv

Improving multilevel regression and poststratification with structured priors

A central theme in the field of survey statistics is estimating population-level quantities through data coming from potentially non-representative samples of the population. Multilevel Regression and Poststratification (MRP), a model-based approach, is gaining traction against the traditional weighted approach for survey estimates. MRP estimates are susceptible to bias if there is an underlying structure that the methodology does not capture. This work aims to provide a new framework for specifying structured prior distributions that lead to bias reduction in MRP estimates. We use simulation studies to explore the benefit of these prior distributions and demonstrate their efficacy on non-representative US survey data. We show that structured prior distributions offer absolute bias reduction and variance reduction for posterior MRP estimates in a large variety of data regimes.

preprint2020arXiv

Know your population and know your model: Using model-based regression and poststratification to generalize findings beyond the observed sample

Psychology research focuses on interactions, and this has deep implications for inference from non-representative samples. For the goal of estimating average treatment effects, we propose to fit a model allowing treatment to interact with background variables and then average over the distribution of these variables in the population. This can be seen as an extension of multilevel regression and poststratification (MRP), a method used in political science and other areas of survey research, where researchers wish to generalize from a sparse and possibly non-representative sample to the general population. In this paper, we discuss areas where this method can be used in the psychological sciences. We use our method to estimate the norming distribution for the Big Five Personality Scale using open source data. We argue that large open data sources like this and other collaborative data sources can be combined with MRP to help resolve current challenges of generalizability and replication in psychology.

preprint2017arXiv

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains. We use Stan for computing and apply the proposal to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference in health surveys.

preprint2016arXiv

Automatic Differentiation Variational Inference

Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use.

preprint2016arXiv

Fitting Bayesian item response models in Stata and Stan

Stata users have access to two easy-to-use implementations of Bayesian inference: Stata's native {\tt bayesmh} function and StataStan, which calls the general Bayesian engine Stan. We compare these on two models that are important for education research: the Rasch model and the hierarchical Rasch model. Stan (as called from Stata) fits a more general range of models than can be fit by {\tt bayesmh} and is also more scalable, in that it could easily fit models with at least ten times more parameters than could be fit using Stata's native Bayesian implementation. In addition, Stan runs between two and ten times faster than {\tt bayesmh} as measured in effective sample size per second: that is, compared to Stan, it takes Stata's built-in Bayesian engine twice to ten times as long to get inferences with equivalent precision. We attribute Stan's advantage in flexibility to its general modeling language, and its advantages in scalability and speed to an efficient sampling algorithm: Hamiltonian Monte Carlo using the no-U-turn sampler. In order to further investigate scalability, we also compared to the package Jags, which performed better than Stata's native Bayesian engine but not as well as StataStan. Given its advantages in speed, generality, and scalability, and that Stan is open-source and can be run directly from Stata using StataStan, we recommend that Stata users adopt Stan as their Bayesian inference engine of choice.

preprint2016arXiv

The 2008 election: A preregistered replication analysis

We present an increasingly stringent set of replications of Ghitza & Gelman (2013), a multilevel regression and poststratification analysis of polls from the 2008 U.S. presidential election campaign, focusing on a set of plots showing the estimated Republican vote share for whites and for all voters, as a function of income level in each of the states. We start with a nearly-exact duplication that uses the posted code and changes only the model-fitting algorithm; we then replicate using already-analyzed data from 2004; and finally we set up preregistered replications using two surveys from 2008 that we had not previously looked at. We have already learned from our preliminary, non-preregistered replication, which has revealed a potential problem with the published analysis of Ghitza & Gelman (2013); it appears that our model may not sufficiently account for nonsampling error, and that some of the patterns presented in that earlier paper may simply reflect noise. In addition to the substantive interest in validating earlier findings about demographics, geography, and voting, the present project serves as a demonstration of preregistration in a setting where the subject matter is historical (and thus the replication data exist before the preregistration plan is written) and where the analysis is exploratory (and thus a replication cannot be simply deemed successful or unsuccessful based on the statistical significance of some particular comparison).

preprint2015arXiv

A Model-Based Approach to Climate Reconstruction Using Tree-Ring Data

Quantifying long-term historical climate is fundamental to understanding recent climate change. Most instrumentally recorded climate data are only available for the past 200 years, so proxy observations from natural archives are often considered. We describe a model-based approach to reconstructing climate defined in terms of raw tree-ring measurement data that simultaneously accounts for non-climatic and climatic variability. In this approach we specify a joint model for the tree-ring data and climate variable that we fit using Bayesian inference. We consider a range of prior densities and compare the modeling approach to current methodology using an example case of Scots pine from Tornetrask, Sweden to reconstruct growing season temperature. We describe how current approaches translate into particular model assumptions. We explore how changes to various components in the model-based approach affect the resulting reconstruction. We show that minor changes in model specification can have little effect on model fit but lead to large changes in the predictions. In particular, the periods of relatively warmer and cooler temperatures are robust between models, but the magnitude of the resulting temperatures are highly model dependent. Such sensitivity may not be apparent with traditional approaches because the underlying statistical model is often hidden or poorly described.

preprint2015arXiv

Age-aggregation bias in mortality trends

In a recent article in PNAS, Case and Deaton show a figure illustrating "a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013." The authors state that their numbers "are not age-adjusted within the 10-y 45-54 age group." They calculated the mortality rate each year by dividing the total number of deaths for the age group by the population of the age group. We suspected an aggregation bias. After adjusting for changes in age composition, we find there is no longer a steady increase in mortality rates for this age group. Instead there is an increasing trend from 1999-2005 and a constant trend thereafter. Moreover, stratifying age-adjusted mortality rates by sex shows a marked increase only for women and not men, contrary to the article's headline. We stress that this does not change a key finding of the Case and Deaton paper: the comparison of non-Hispanic U.S. middle-aged whites to other countries and other ethnic groups. These comparisons hold up after our age adjustment. While we do not believe that age-adjustment invalidates comparisons between countries, it does affect claims concerning the absolute increase in mortality among U.S. middle-aged white non-Hispanics. Breaking down the trends in this group by region of the country shows other interesting patterns: since 1999 there has been an increase in death rates among women in the south. In contrast, death rates for both sexes have been declining in the northeast, the region where mortality rates were lowest to begin with. These graphs demonstrate the value of this sort of data exploration, and we are grateful to Case and Deaton for focusing attention on these mortality trends.

preprint2015arXiv

Automatic Variational Inference in Stan

Variational inference is a scalable technique for approximate Bayesian inference. Deriving variational inference algorithms requires tedious model-specific calculations; this makes it difficult to automate. We propose an automatic variational inference algorithm, automatic differentiation variational inference (ADVI). The user only provides a Bayesian model and a dataset; nothing else. We make no conjugacy assumptions and support a broad class of models. The algorithm automatically determines an appropriate variational family and optimizes the variational objective. We implement ADVI in Stan (code available now), a probabilistic programming framework. We compare ADVI to MCMC sampling across hierarchical generalized linear models, nonconjugate matrix factorization, and a mixture model. We train the mixture model on a quarter million images. With ADVI we can use variational inference on any model we write in Stan.

preprint2015arXiv

Bayesian Nonparametric Weighted Sampling Inference

It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family and Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency, which works because we induce regularization for small cells and thus this is a way of automatically smoothing the highly variable weights.

preprint2015arXiv

Beyond subjective and objective in statistics

We argue that the words "objectivity" and "subjectivity" in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, impartiality, and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. The advantage of these reformulations is that the replacement terms do not oppose each other. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can recognize desirable attributes such as transparency and acknowledgment of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling, and socioeconomic stratification.

preprint2015arXiv

Design of the Millennium Villages Project Sampling Plan: a simulation study for a multi-module survey

The Millennium Villages Project (MVP) is a ten-year integrated rural development project implemented in ten sub-Saharan African sites. At its conclusion we will conduct an evaluation of its causal effect on a variety of development outcomes, measured via household surveys in treatment and comparison areas. Outcomes are measured by six survey modules, with sample sizes for each demographic group determined by budget, logistics, and the group's vulnerability. We design a sampling plan that aims to reduce effort for survey enumerators and maximize precision for all outcomes. We propose two-stage sampling designs, sampling households at the first stage, followed by a second stage sample that differs across demographic groups. Two-stage designs are usually constructed by simple random sampling (SRS) of households and proportional within-household sampling, or probability proportional to size sampling (PPS) of households with fixed sampling within each. No measure of household size is proportional for all demographic groups, putting PPS schemes at a disadvantage. The SRS schemes have the disadvantage that multiple individuals sampled per household decreases efficiency due to intra-household correlation. We conduct a simulation study (using both design- and model-based survey inference) to understand these tradeoffs and recommend a sampling plan for the Millennium Villages Project. Similar design issues arise in other studies with surveys that target different demographic groups.

preprint2014arXiv

How Bayesian Analysis Cracked the Red-State, Blue-State Problem

In the United States as in other countries, political and economic divisions cut along geographic and demographic lines. Richer people are more likely to vote for Republican candidates while poorer voters lean Democratic; this is consistent with the positions of the two parties on economic issues. At the same time, richer states on the coasts are bastions of the Democrats, while most of the generally lower-income areas in the middle of the country strongly support Republicans. During a research project lasting several years, we reconciled these patterns by fitting a series of multilevel models to perform inference on geographic and demographic subsets of the population. We were using national survey data with relatively small samples in some states, ethnic groups and income categories; this motivated the use of Bayesian inference to partially pool between fitted models and local data. Previous, non-Bayesian analyses of income and voting had failed to connect individual and state-level patterns. Now that our analysis has been done, we believe it could be replicated using non-Bayesian methods, but Bayesian inference helped us crack the problem by directly handling the uncertainty that is inherent in working with sparse data.

preprint2014arXiv

The Mythical Swing Voter

The only acceptable form of polling in the multi-billion dollar survey research field utilizes representative samples. We argue that with proper statistical adjustment, non-representative polling can provide accurate predictions, and often in a much more timely and cost-effective fashion. We demonstrate this by applying multilevel regression and post-stratification (MRP) to a 2012 election survey on the Xbox gaming platform. Not only do the transformed top-line projections from this data closely trend standard indicators, but we use the unique nature of the data's size and panel to answer a meaningful political puzzle. We find that reported swings in public opinion polls are generally not due to actual shifts in vote intention, but rather are the result of temporary periods of relatively low response rates among supporters of the reportedly slumping candidate. This work shows great promise for using non-representative polling to measure public opinion and the first product of this new polling technique raises the possibility that decades of large, reported swings in public opinion-including the perennial "convention bounce"-are mostly artifacts of sampling bias.

preprint2013arXiv

Convincing Evidence

Textbooks on statistics emphasize care and precision, via concepts such as reliability and validity in measurement, random sampling and treatment assignment in data collection, and causal identification and bias in estimation. But how do researchers decide what to believe and what to trust when choosing which statistical methods to use? How do they decide the credibility of methods? Statisticians and statistical practitioners seem to rely on a sense of anecdotal evidence based on personal experience and on the attitudes of trusted colleagues. Authorship, reputation, and past experience are thus central to decisions about statistical procedures.

preprint2013arXiv

Simulation-efficient shortest probability intervals

Bayesian highest posterior density (HPD) intervals can be estimated directly from simulations via empirical shortest intervals. Unfortunately, these can be noisy (that is, have a high Monte Carlo error). We derive an optimal weighting strategy using bootstrap and quadratic programming to obtain a more compu- tationally stable HPD, or in general, shortest probability interval (Spin). We prove the consistency of our method. Simulation studies on a range of theoret- ical and real-data examples, some with symmetric and some with asymmetric posterior densities, show that intervals constructed using Spin have better cov- erage (relative to the posterior distribution) and lower Monte Carlo error than empirical shortest intervals. We implement the new method in an R package (SPIn) so it can be routinely used in post-processing of Bayesian simulations.

preprint2013arXiv

Understanding predictive information criteria for Bayesian models

We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a biascorrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this review is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice.

preprint2012arXiv

In praise of the referee

There has been a lively debate in many fields, including statistics and related applied fields such as psychology and biomedical research, on possible reforms of the scholarly publishing system. Currently, referees contribute so much to improve scientific papers, both directly through constructive criticism and indirectly through the threat of rejection. We discuss ways in which new approaches to journal publication could continue to make use of the valuable efforts of peer reviewers.

preprint2012arXiv

On the Stationary Distribution of Iterative Imputations

Iterative imputation, in which variables are imputed one at a time each given a model predicting from all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modeling problem with relatively simple univariate regressions. In this paper, we begin to characterize the stationary distributions of iterative imputations and their statistical properties. More precisely, when the conditional models are compatible (defined in the text), we give a set of sufficient conditions under which the imputation distribution converges in total variation to the posterior distribution of a Bayesian model. When the conditional models are incompatible but are valid, we show that the combined imputation estimator is consistent.

preprint2012arXiv

The anti-Bayesian moment and its passing

The present article is the reply to the discussion of our earlier "Not only defended but also applied" (arXiv:1006.5366, to appear in The American Statistician) that arose from our memory of a particularly intemperate anti-Bayesian statement in Feller's beautiful and classic book on probability theory. We felt that it was worth exploring the very extremeness of Feller's words, along with similar anti-Bayesian remarks by others, in order to better understand the background underlying controversies that still exist regarding the foundations of statistics. We thank the four discussants of our article for their contributions to our understanding of these controversies as they have existed in the past and persist today.

preprint2011arXiv

Bayesian Statistical Pragmatism

Discussion of "Statistical Inference: The Big Picture" by R. E. Kass [arXiv:1106.2895]

preprint2011arXiv

Inherent Difficulties of Non-Bayesian Likelihood-based Inference, as Revealed by an Examination of a Recent Book by Aitkin

For many decades, statisticians have made attempts to prepare the Bayesian omelette without breaking the Bayesian eggs; that is, to obtain probabilistic likelihood-based inferences without relying on informative prior distributions. A recent example is Murray Aitkin's recent book, {\em Statistical Inference}, which presents an approach to statistical hypothesis testing based on comparisons of posterior distributions of likelihoods under competing models. Aitkin develops and illustrates his method using some simple examples of inference from iid data and two-way tests of independence. We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective and why we do not find it relevant for applied work.

preprint2011arXiv

Philosophy and the practice of Bayesian statistics

A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.

preprint2011arXiv

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size ε and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ε on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.

preprint2010arXiv

"How many zombies do you know?" Using indirect survey methods to measure alien attacks and outbreaks of the undead

The zombie menace has so far been studied only qualitatively or through the use of mathematical models without empirical content. We propose to use a new tool in survey research to allow zombies to be studied indirectly without risk to the interviewers.

preprint2010arXiv

Bayes, Jeffreys, Prior Distributions and the Philosophy of Statistics

Discussion of "Harold Jeffreys's Theory of Probability revisited," by Christian Robert, Nicolas Chopin, and Judith Rousseau, for Statistical Science [arXiv:0804.3173]

preprint2010arXiv

Bayesian Statistics Then and Now

Discussion of "The Future of Indirect Evidence" by Bradley Efron [arXiv:1012.1161]

preprint2010arXiv

Causality and Statistical Learning

We review some approaches and philosophies of causal inference coming from sociology, economics, computer science, cognitive science, and statistics

Andrew Gelman

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

Multilevel Regression and Poststratification Interface: An Application to Track Community-level COVID-19 Viral Transmission

Beyond Vaccination Rates: A Synthetic Random Proxy Metric of Total SARS-CoV-2 Immunity Seroprevalence in the Community

Delivering data differently

Pathfinder: Parallel quasi-Newton variational inference

The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning

Using sex and gender in survey adjustment

Making the most of imprecise measurements: Changing patterns of arsenic concentrations in shallow wells of Bangladesh from laboratory and field data

Accounting for Uncertainty During a Pandemic

Adaptive Path Sampling in Metastable Posterior Distributions

Bayesian aggregation of average data: An application in drug development

Bayesian hierarchical weighting adjustment and survey inference

Improving multilevel regression and poststratification with structured priors

Know your population and know your model: Using model-based regression and poststratification to generalize findings beyond the observed sample

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Automatic Differentiation Variational Inference

Fitting Bayesian item response models in Stata and Stan

The 2008 election: A preregistered replication analysis

A Model-Based Approach to Climate Reconstruction Using Tree-Ring Data

Age-aggregation bias in mortality trends

Automatic Variational Inference in Stan

Bayesian Nonparametric Weighted Sampling Inference

Beyond subjective and objective in statistics

Design of the Millennium Villages Project Sampling Plan: a simulation study for a multi-module survey

How Bayesian Analysis Cracked the Red-State, Blue-State Problem

The Mythical Swing Voter

Convincing Evidence

Simulation-efficient shortest probability intervals

Understanding predictive information criteria for Bayesian models

In praise of the referee

On the Stationary Distribution of Iterative Imputations

The anti-Bayesian moment and its passing

Bayesian Statistical Pragmatism

Inherent Difficulties of Non-Bayesian Likelihood-based Inference, as Revealed by an Examination of a Recent Book by Aitkin

Philosophy and the practice of Bayesian statistics

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

"How many zombies do you know?" Using indirect survey methods to measure alien attacks and outbreaks of the undead

Bayes, Jeffreys, Prior Distributions and the Philosophy of Statistics

Bayesian Statistics Then and Now

Causality and Statistical Learning