Researcher profile

Sara Algeri

Sara Algeri contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Estimating the lifetime risk of a false positive screening test result

False positive results in screening tests have potentially severe psychological, medical, and financial consequences for the recipient. However, there have been few efforts to quantify how the risk of a false positive accumulates over time. We seek to fill this gap by estimating the probability that an individual who adheres to the U.S. Preventive Services Task Force (USPSTF) screening guidelines will receive at least one false positive in a lifetime. To do so, we assembled a data set of 116 studies cited by the USPSTF that report the number of true positives, false negatives, true negatives, and false positives for the primary screening procedure for one of five cancers or six sexually transmitted diseases. We use these data to estimate the probability that an individual in one of 14 demographic subpopulations will receive at least one false positive for one of these eleven diseases in a lifetime. We specify a suitable statistical model to account for the hierarchical structure of the data, and we use the parametric bootstrap to quantify the uncertainty surrounding our estimates. The estimated probability of receiving at least one false positive in a lifetime is 85.5% ($\pm$0.9%) and 38.9% ($\pm$3.6%) for baseline groups of women and men, respectively. It is higher for subpopulations recommended to screen more frequently than the baseline, including more vulnerable groups such as pregnant women and men who have sex with men. Since screening technology is imperfect, false positives remain inevitable. The high lifetime risk of a false positive reveals the importance of educating patients about this phenomenon.

preprint2022arXiv

Informative Goodness-of-Fit for Multivariate Distributions

This article introduces an informative goodness-of-fit (iGOF) approach to study multivariate distributions. When the null model is rejected, iGOF allows us to identify the underlying sources of mismodeling and naturally equips practitioners with additional insights on the nature of the deviations from the true distribution. The informative character of the procedure is achieved by exploiting smooth tests and random fields theory to facilitate the analysis of multivariate data. Simulation studies show that iGOF enjoys high power for different types of alternatives. The methods presented here directly address the problem of background mismodeling arising in physics and astronomy. It is in these areas that the motivation of this work is rooted.

preprint2022arXiv

K-2 rotated goodness-of-fit for multivariate data

Consider a set of multivariate distributions, $F_1,\dots,F_M$, aiming to explain the same phenomenon. For instance, each $F_m$ may correspond to a different candidate background model for calibration data, or to one of many possible signal models we aim to validate on experimental data. In this article, we show that tests for a wide class of apparently different models $F_{m}$ can be mapped into a single test for a reference distribution $Q$. As a result, valid inference for each $F_m$ can be obtained by simulating \underline{only} the distribution of the test statistic under $Q$. Furthermore, $Q$ can be chosen conveniently simple to substantially reduce the computational time.

preprint2021arXiv

Exhaustive goodness-of-fit via smoothed inference and graphics

Classical tests of goodness-of-fit aim to validate the conformity of a postulated model to the data under study. Given their inferential nature, they can be considered a crucial step in confirmatory data analysis. In their standard formulation, however, they do not allow exploring how the hypothesized model deviates from the truth nor do they provide any insight into how the rejected model could be improved to better fit the data. The main goal of this work is to establish a comprehensive framework for goodness-of-fit which naturally integrates modeling, estimation, inference, and graphics. Modeling and estimation focus on a novel formulation of smooth tests that easily extends to arbitrary distributions, either continuous or discrete. Inference and adequate post-selection adjustments are performed via a specially designed smoothed bootstrap and the results are summarized via an exhaustive graphical tool called CD-plot.

preprint2019arXiv

Detecting new signals under background mismodelling

Searches for new astrophysical phenomena often involve several sources of non-random uncertainties which can lead to highly misleading results. Among these, model-uncertainty arising from background mismodelling can dramatically compromise the sensitivity of the experiment under study. Specifically, overestimating the background distribution in the signal region increases the chances of missing new physics. Conversely, underestimating the background outside the signal region leads to an artificially enhanced sensitivity and a higher likelihood of claiming false discoveries. The aim of this work is to provide a unified statistical strategy to perform modelling, estimation, inference, and signal characterization under background mismodelling. The method proposed allows to incorporate the (partial) scientific knowledge available on the background distribution and provides a data-updated version of it in a purely nonparametric fashion without requiring the specification of prior distributions on the parameters. Applications in the context of dark matter searches and radio surveys show how the tools presented in this article can be used to incorporate non-stochastic uncertainty due to instrumental noise and to overcome violations of classical distributional assumptions in stacking experiments.

preprint2019arXiv

Searching for new physics with profile likelihoods: Wilks and beyond

Particle physics experiments use likelihood ratio tests extensively to compare hypotheses and to construct confidence intervals. Often, the null distribution of the likelihood ratio test statistic is approximated by a $χ^2$ distribution, following a theorem due to Wilks. However, many circumstances relevant to modern experiments can cause this theorem to fail. In this paper, we review how to identify these situations and construct valid inference.

preprint2019arXiv

Testing One Hypothesis Multiple times

In applied settings, tests of hypothesis where a nuisance parameter is only identifiable under the alternative often reduces into one of Testing One Hypothesis Multiple times (TOHM). Specifically, a fine discretization of the space of the non-identifiable parameter is specified, and the null hypothesis is tested against a set of sub-alternative hypothesis, one for each point of the discretization. The resulting sub-test statistics are then combined to obtain a global p-value. In this paper, we discuss a computationally efficient inferential tool to perform TOHM under stringent significance requirements, such as those typically required in the physical sciences, (e.g., p-value $<10^{-7}$). The resulting procedure leads to a generalized approach to perform inference under non-standard conditions, including non-nested models comparisons.

preprint2019arXiv

Testing One Hypothesis Multiple Times: The Multidimensional Case

The identification of new rare signals in data, the detection of a sudden change in a trend, and the selection of competing models, are among the most challenging problems in statistical practice. These challenges can be tackled using a test of hypothesis where a nuisance parameter is present only under the alternative, and a computationally efficient solution can be obtained by the &#34;Testing One Hypothesis Multiple times&#34; (TOHM) method. In the one-dimensional setting, a fine discretization of the space of the non-identifiable parameter is specified, and a global p-value is obtained by approximating the distribution of the supremum of the resulting stochastic process. In this paper, we propose a computationally efficient inferential tool to perform TOHM in the multidimensional setting. Here, the approximations of interest typically involve the expected Euler Characteristics (EC) of the excursion set of the underlying random field. We introduce a simple algorithm to compute the EC in multiple dimensions and for arbitrary large significance levels. This leads to an highly generalizable computational tool to perform inference under non-standard regularity conditions.