Source author record

Sara Algeri

Sara Algeri appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology physics.data-an Applications astro-ph.HE hep-ph astro-ph.IM hep-ex Computation math.ST Statistics Theory

Catalog footprint

What is connected

10works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Estimating the lifetime risk of a false positive screening test result

False positive results in screening tests have potentially severe psychological, medical, and financial consequences for the recipient. However, there have been few efforts to quantify how the risk of a false positive accumulates over time. We seek to fill this gap by estimating the probability that an individual who adheres to the U.S. Preventive Services Task Force (USPSTF) screening guidelines will receive at least one false positive in a lifetime. To do so, we assembled a data set of 116 studies cited by the USPSTF that report the number of true positives, false negatives, true negatives, and false positives for the primary screening procedure for one of five cancers or six sexually transmitted diseases. We use these data to estimate the probability that an individual in one of 14 demographic subpopulations will receive at least one false positive for one of these eleven diseases in a lifetime. We specify a suitable statistical model to account for the hierarchical structure of the data, and we use the parametric bootstrap to quantify the uncertainty surrounding our estimates. The estimated probability of receiving at least one false positive in a lifetime is 85.5% ($\pm$0.9%) and 38.9% ($\pm$3.6%) for baseline groups of women and men, respectively. It is higher for subpopulations recommended to screen more frequently than the baseline, including more vulnerable groups such as pregnant women and men who have sex with men. Since screening technology is imperfect, false positives remain inevitable. The high lifetime risk of a false positive reveals the importance of educating patients about this phenomenon.

preprint2022arXiv

Informative Goodness-of-Fit for Multivariate Distributions

This article introduces an informative goodness-of-fit (iGOF) approach to study multivariate distributions. When the null model is rejected, iGOF allows us to identify the underlying sources of mismodeling and naturally equips practitioners with additional insights on the nature of the deviations from the true distribution. The informative character of the procedure is achieved by exploiting smooth tests and random fields theory to facilitate the analysis of multivariate data. Simulation studies show that iGOF enjoys high power for different types of alternatives. The methods presented here directly address the problem of background mismodeling arising in physics and astronomy. It is in these areas that the motivation of this work is rooted.

preprint2022arXiv

K-2 rotated goodness-of-fit for multivariate data

Consider a set of multivariate distributions, $F_1,\dots,F_M$, aiming to explain the same phenomenon. For instance, each $F_m$ may correspond to a different candidate background model for calibration data, or to one of many possible signal models we aim to validate on experimental data. In this article, we show that tests for a wide class of apparently different models $F_{m}$ can be mapped into a single test for a reference distribution $Q$. As a result, valid inference for each $F_m$ can be obtained by simulating \underline{only} the distribution of the test statistic under $Q$. Furthermore, $Q$ can be chosen conveniently simple to substantially reduce the computational time.

preprint2021arXiv

Exhaustive goodness-of-fit via smoothed inference and graphics

Classical tests of goodness-of-fit aim to validate the conformity of a postulated model to the data under study. Given their inferential nature, they can be considered a crucial step in confirmatory data analysis. In their standard formulation, however, they do not allow exploring how the hypothesized model deviates from the truth nor do they provide any insight into how the rejected model could be improved to better fit the data. The main goal of this work is to establish a comprehensive framework for goodness-of-fit which naturally integrates modeling, estimation, inference, and graphics. Modeling and estimation focus on a novel formulation of smooth tests that easily extends to arbitrary distributions, either continuous or discrete. Inference and adequate post-selection adjustments are performed via a specially designed smoothed bootstrap and the results are summarized via an exhaustive graphical tool called CD-plot.

preprint2019arXiv

Detecting new signals under background mismodelling

Searches for new astrophysical phenomena often involve several sources of non-random uncertainties which can lead to highly misleading results. Among these, model-uncertainty arising from background mismodelling can dramatically compromise the sensitivity of the experiment under study. Specifically, overestimating the background distribution in the signal region increases the chances of missing new physics. Conversely, underestimating the background outside the signal region leads to an artificially enhanced sensitivity and a higher likelihood of claiming false discoveries. The aim of this work is to provide a unified statistical strategy to perform modelling, estimation, inference, and signal characterization under background mismodelling. The method proposed allows to incorporate the (partial) scientific knowledge available on the background distribution and provides a data-updated version of it in a purely nonparametric fashion without requiring the specification of prior distributions on the parameters. Applications in the context of dark matter searches and radio surveys show how the tools presented in this article can be used to incorporate non-stochastic uncertainty due to instrumental noise and to overcome violations of classical distributional assumptions in stacking experiments.

preprint2019arXiv

Searching for new physics with profile likelihoods: Wilks and beyond

Particle physics experiments use likelihood ratio tests extensively to compare hypotheses and to construct confidence intervals. Often, the null distribution of the likelihood ratio test statistic is approximated by a $χ^2$ distribution, following a theorem due to Wilks. However, many circumstances relevant to modern experiments can cause this theorem to fail. In this paper, we review how to identify these situations and construct valid inference.

preprint2019arXiv

Testing One Hypothesis Multiple times

In applied settings, tests of hypothesis where a nuisance parameter is only identifiable under the alternative often reduces into one of Testing One Hypothesis Multiple times (TOHM). Specifically, a fine discretization of the space of the non-identifiable parameter is specified, and the null hypothesis is tested against a set of sub-alternative hypothesis, one for each point of the discretization. The resulting sub-test statistics are then combined to obtain a global p-value. In this paper, we discuss a computationally efficient inferential tool to perform TOHM under stringent significance requirements, such as those typically required in the physical sciences, (e.g., p-value $<10^{-7}$). The resulting procedure leads to a generalized approach to perform inference under non-standard conditions, including non-nested models comparisons.

preprint2019arXiv

Testing One Hypothesis Multiple Times: The Multidimensional Case

The identification of new rare signals in data, the detection of a sudden change in a trend, and the selection of competing models, are among the most challenging problems in statistical practice. These challenges can be tackled using a test of hypothesis where a nuisance parameter is present only under the alternative, and a computationally efficient solution can be obtained by the "Testing One Hypothesis Multiple times" (TOHM) method. In the one-dimensional setting, a fine discretization of the space of the non-identifiable parameter is specified, and a global p-value is obtained by approximating the distribution of the supremum of the resulting stochastic process. In this paper, we propose a computationally efficient inferential tool to perform TOHM in the multidimensional setting. Here, the approximations of interest typically involve the expected Euler Characteristics (EC) of the excursion set of the underlying random field. We introduce a simple algorithm to compute the EC in multiple dimensions and for arbitrary large significance levels. This leads to an highly generalizable computational tool to perform inference under non-standard regularity conditions.

preprint2016arXiv

A method for comparing non-nested models with application to astrophysical searches for new physics

Searches for unknown physics and decisions between competing astrophysical models to explain data both rely on statistical hypothesis testing. The usual approach in searches for new physical phenomena is based on the statistical Likelihood Ratio Test (LRT) and its asymptotic properties. In the common situation, when neither of the two models under comparison is a special case of the other i.e., when the hypotheses are non-nested, this test is not applicable. In astrophysics, this problem occurs when two models that reside in different parameter spaces are to be compared. An important example is the recently reported excess emission in astrophysical $γ$-rays and the question whether its origin is known astrophysics or dark matter. We develop and study a new, simple, generally applicable, frequentist method and validate its statistical properties using a suite of simulations studies. We exemplify it on realistic simulated data of the Fermi-LAT $γ$-ray satellite, where non-nested hypotheses testing appears in the search for particle dark matter.

preprint2016arXiv

On methods for correcting for the look-elsewhere effect in searches for new physics

The search for new significant peaks over a energy spectrum often involves a statistical multiple hypothesis testing problem. Separate tests of hypothesis are conducted at different locations producing an ensemble of local p-values, the smallest of which is reported as evidence for the new resonance. Unfortunately, controlling the false detection rate (type I error rate) of such procedures may lead to excessively stringent acceptance criteria. In the recent physics literature, two promising statistical tools have been proposed to overcome these limitations. In 2005, a method to "find needles in haystacks" was introduced by Pilla et al. [1], and a second method was later proposed by Gross and Vitells [2] in the context of the "look elsewhere effect" and trial factors. We show that, for relatively small sample sizes, the former leads to an artificial inflation of statistical power that stems from an increase in the false detection rate, whereas the two methods exhibit similar performance for large sample sizes. We apply the methods to realistic simulations of the Fermi Large Area Telescope data, in particular the search for dark matter annihilation lines. Further, we discuss the counter-intutive scenario where the look-elsewhere corrections are more conservative than much more computationally efficient corrections for multiple hypothesis testing. Finally, we provide general guidelines for navigating the tradeoffs between statistical and computational efficiency when selecting a statistical procedure for signal detection.

Sara Algeri

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Estimating the lifetime risk of a false positive screening test result

Informative Goodness-of-Fit for Multivariate Distributions

K-2 rotated goodness-of-fit for multivariate data

Exhaustive goodness-of-fit via smoothed inference and graphics

Detecting new signals under background mismodelling

Searching for new physics with profile likelihoods: Wilks and beyond

Testing One Hypothesis Multiple times

Testing One Hypothesis Multiple Times: The Multidimensional Case

A method for comparing non-nested models with application to astrophysical searches for new physics

On methods for correcting for the look-elsewhere effect in searches for new physics