Source author record

Roger D. Peng

Roger D. Peng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications cs.CY Methodology Quantitative Methods stat.OT

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Inside Out: Externalizing Assumptions in Data Analysis as Validation Checks

In data analysis, unexpected results often prompt researchers to revisit their procedures to identify potential issues. While some researchers may struggle to identify the root causes, experienced researchers can often quickly diagnose problems by checking a few key assumptions. These checked assumptions, or expectations, are typically informal, difficult to trace, and rarely discussed in publications. In this paper, we introduce the term *analysis validation checks* to formalize and externalize these informal assumptions. We then introduce a procedure to identify a subset of checks that best predict the occurrence of unexpected outcomes, based on simulations of the original data. The checks are evaluated in terms of accuracy, determined by binary classification metrics, and independence, which measures the shared information among checks. We demonstrate this approach with a toy example using step count data and a generalized linear model example examining the effect of particulate matter air pollution on daily mortality.

preprint2022arXiv

Implications of Mortality Displacement for Effect Modification and Selection Bias

Mortality displacement is the concept that deaths are moved forward in time (e.g., a few days, several months, and years) by exposure from when they would occur without the exposure, which is common in environmental time-series studies. Using concepts of a frail population and loss of life expectancy, it is understood that mortality displacement may decrease rate ratio (RR). Such decreases are thought to be minimal or substantial depending on study populations. Environmental epidemiologists have interpreted RR considering mortality displacement. This theoretical paper reveals that mortality displacement can be formulated as a built-in selection bias of RR in Cox models due to unmeasured risk factors independent from exposure of interest, and mortality displacement can also be viewed as an effect modifier by integrating the concepts of rate and loss of life expectancy. Thus, depending on the framework through which we view bias, mortality displacement can be categorized as selection bias in the bias taxonomy of epidemiology, and simultaneously mortality displacement can be seen as an effect modifier. This dichotomy provides useful implications regarding policy, effect modification, exposure time-windows selection, and generalizability, specifically why research in epidemiology may produce unexpected and heterogeneous RR over different studies and sub-populations.

preprint2020arXiv

Reproducible Research: A Retrospective

Rapid advances in computing technology over the past few decades have spurred two extraordinary phenomena in science: large-scale and high-throughput data collection coupled with the creation and implementation of complex statistical algorithms for data analysis. Together, these two phenomena have brought about tremendous advances in scientific discovery but have also raised two serious concerns, one relatively new and one quite familiar. The complexity of modern data analyses raises questions about the reproducibility of the analyses, meaning the ability of independent analysts to re-create the results claimed by the original authors using the original data and analysis techniques. While seemingly a straightforward concept, reproducibility of analyses is typically thwarted by the lack of availability of the data and computer code that were used in the analyses. A much more general concern is the replicability of scientific findings, which concerns the frequency with which scientific claims are confirmed by completely independent investigations. While the concepts of reproduciblity and replicability are related, it is worth noting that they are focused on quite different goals and address different aspects of scientific progress. In this review, we will discuss the origins of reproducible research, characterize the current status of reproduciblity in public health research, and connect reproduciblity to current concerns about replicability of scientific findings. Finally, we describe a path forward for improving both the reproducibility and replicability of public health research in the future.

preprint2015arXiv

A glass half full interpretation of the replicability of psychological science

A recent study of the replicability of key psychological findings is a major contribution toward understanding the human side of the scientific process. Despite the careful and nuanced analysis reported in the paper, mass and social media adhered to the simple narrative that only 36% of the studies replicated their original results. Here we show that 77% of the replication effect sizes reported were within a prediction interval based on the original effect size. In this light, the results of Reproducibility Project: Psychology can be viewed as a positive result for the scientific process.

preprint2015arXiv

Reproducible Research Can Still Be Wrong: Adopting a Prevention Approach

Reproducibility, the ability to recompute results, and replicability, the chances other experimenters will achieve a consistent result, are two foundational characteristics of successful scientific research. Consistent findings from independent investigators are the primary means by which scientific evidence accumulates for or against an hypothesis. And yet, of late there has been a crisis of confidence among researchers worried about the rate at which studies are either reproducible or replicable. In order to maintain the integrity of science research and maintain the public's trust in science, the scientific community must ensure reproducibility and replicability by engaging in a more preventative approach that greatly expands data analysis education and routinely employs software tools.

Roger D. Peng

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Inside Out: Externalizing Assumptions in Data Analysis as Validation Checks

Implications of Mortality Displacement for Effect Modification and Selection Bias

Reproducible Research: A Retrospective

A glass half full interpretation of the replicability of psychological science

Reproducible Research Can Still Be Wrong: Adopting a Prevention Approach