Source author record

Eric-Jan Wagenmakers

Eric-Jan Wagenmakers appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology stat.OT Applications math.ST Statistics Theory

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bayes Factors for Peri-Null Hypotheses

A perennial objection against Bayes factor point-null hypothesis tests is that the point-null hypothesis is known to be false from the outset. We examine the consequences of approximating the sharp point-null hypothesis by a hazy `peri-null' hypothesis instantiated as a narrow prior distribution centered on the point of interest. The peri-null Bayes factor then equals the point-null Bayes factor multiplied by a correction term which is itself a Bayes factor. For moderate sample sizes, the correction term is relatively inconsequential; however, for large sample sizes the correction term becomes influential and causes the peri-null Bayes factor to be inconsistent and approach a limit that depends on the ratio of prior ordinates evaluated at the maximum likelihood estimate. We characterize the asymptotic behavior of the peri-null Bayes factor and briefly discuss suggestions on how to construct peri-null Bayes factor hypothesis tests that are also consistent.

preprint2022arXiv

Combine Statistical Thinking With Open Scientific Practice: A Protocol of a Bayesian Research Project

Current developments in the statistics community suggest that modern statistics education should be structured holistically, that is, by allowing students to work with real data and to answer concrete statistical questions, but also by educating them about alternative frameworks, such as Bayesian inference. In this article, we describe how we incorporated such a holistic structure in a Bayesian research project on ordered binomial probabilities. The project was conducted with a group of three undergraduate psychology students who had basic knowledge of Bayesian statistics and programming, but lacked formal mathematical training. The research project aimed to (1) convey the basic mathematical concepts of Bayesian inference; (2) have students experience the entire empirical cycle including collection, analysis, and interpretation of data and (3) teach students open science practices.

preprint2022arXiv

Default Bayes Factors for Testing the (In)equality of Several Population Variances

Testing the (in)equality of variances is an important problem in many statistical applications. We develop default Bayes factor tests to assess the (in)equality of two or more population variances, as well as a test for whether the population variances equal a specific value. The resulting test can be used to check assumptions for commonly used procedures such as the $t$-test or ANOVA, or test substantive hypotheses concerning variances directly. We show that our Bayes factor fulfills a number of desiderata. Researchers may have directed hypotheses such as $σ_{1}^{2} > σ_{2}^{2}$, they may want to extend $\mathcal{H}_{0}$ to have a null-region, or wish to combine hypotheses about equality with hypotheses about inequality, for example $σ_{1}^{2} = σ_{2}^{2} > (σ_{3}^{2}, σ_{4}^{2})$. We extend our Bayes factor test to allow for these deviations from our proposed default and illustrate it on a number of practical examples. Our procedure is implemented in the R package $bfvartest$.

preprint2022arXiv

History and Nature of the Jeffreys-Lindley Paradox

The Jeffreys-Lindley paradox exposes a rift between Bayesian and frequentist hypothesis testing that strikes at the heart of statistical inference. Contrary to what most current literature suggests, the paradox was central to the Bayesian testing methodology developed by Sir Harold Jeffreys in the late 1930s. Jeffreys showed that the evidence against a point-null hypothesis $\mathcal{H}_0$ scales with $\sqrt{n}$ and repeatedly argued that it would therefore be mistaken to set a threshold for rejecting $\mathcal{H}_0$ at a constant multiple of the standard error. Here we summarize Jeffreys's early work on the paradox and clarify his reasons for including the $\sqrt{n}$ term. The prior distribution is seen to play a crucial role; by implicitly correcting for selection, small parameter values are identified as relatively surprising under $\mathcal{H}_1$. We highlight the general nature of the paradox by presenting both a fully frequentist and a fully Bayesian version. We also demonstrate that the paradox does not depend on assigning prior mass to a point hypothesis, as is commonly believed.

preprint2022arXiv

When Evidence and Significance Collide

Null hypothesis statistical significance testing (NHST) is the dominant approach for evaluating results from randomized controlled trials. Whereas NHST comes with long-run error rate guarantees, its main inferential tool -- the $p$-value -- is only an indirect measure of evidence against the null hypothesis. The main reason is that the $p$-value is based on the assumption the null hypothesis is true, whereas the likelihood of the data under any alternative hypothesis is ignored. If the goal is to quantify how much evidence the data provide for or against the null hypothesis it is unavoidable that an alternative hypothesis be specified (Goodman & Royall, 1988). Paradoxes arise when researchers interpret $p$-values as evidence. For instance, results that are surprising under the null may be equally surprising under a plausible alternative hypothesis, such that a $p=.045$ result (`reject the null') does not make the null any less plausible than it was before. Hence, $p$-values have been argued to overestimate the evidence against the null hypothesis. Conversely, it can be the case that statistically non-significant results (i.e., $p>.05)$ nevertheless provide some evidence in favor of the alternative hypothesis. It is therefore crucial for researchers to know when statistical significance and evidence collide, and this requires that a direct measure of evidence is computed and presented alongside the traditional $p$-value.

preprint2016arXiv

J. B. S. Haldane's Contribution to the Bayes Factor Hypothesis Test

This article brings attention to some historical developments that gave rise to the Bayes factor for testing a point null hypothesis against a composite alternative. In line with current thinking, we find that the conceptual innovation - to assign prior mass to a general law - is due to a series of three articles by Dorothy Wrinch and Sir Harold Jeffreys (1919, 1921, 1923). However, our historical investigation also suggests that in 1932 J. B. S. Haldane made an important contribution to the development of the Bayes factor by proposing the use of a mixture prior comprising a point mass and a continuous probability density. Jeffreys was aware of Haldane's work and it may have inspired him to pursue a more concrete statistical implementation for his conceptual ideas. It thus appears that Haldane may have played a much bigger role in the statistical development of the Bayes factor than has hitherto been assumed.

preprint2015arXiv

Hidden Multiplicity in Multiway ANOVA: Prevalence and Remedies

Many psychologists do not realize that exploratory use of the popular multiway analysis of variance (ANOVA) harbors a multiple comparison problem. In the case of two factors, three separate null hypotheses are subject to test (i.e., two main effects and one interaction). Consequently, the probability of at least one Type I error (if all null hypotheses are true) is 14% rather than 5% if the three tests are independent. We explain the multiple comparison problem and demonstrate that researchers almost never correct for it. To mitigate the problem, we describe four remedies: the omnibus F test, the control of familywise error rate, the control of false discovery rate, and the preregistration of hypotheses.

preprint2015arXiv

Testing Order Constraints: Qualitative Differences Between Bayes Factors and Normalized Maximum Likelihood

We compared Bayes factors to normalized maximum likelihood for the simple case of selecting between an order-constrained versus a full binomial model. This comparison revealed two qualitative differences in testing order constraints regarding data dependence and model preference.

Eric-Jan Wagenmakers

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Bayes Factors for Peri-Null Hypotheses

Combine Statistical Thinking With Open Scientific Practice: A Protocol of a Bayesian Research Project

Default Bayes Factors for Testing the (In)equality of Several Population Variances

History and Nature of the Jeffreys-Lindley Paradox

When Evidence and Significance Collide

J. B. S. Haldane's Contribution to the Bayes Factor Hypothesis Test

Hidden Multiplicity in Multiway ANOVA: Prevalence and Remedies

Testing Order Constraints: Qualitative Differences Between Bayes Factors and Normalized Maximum Likelihood