Researcher profile

Antonietta Mira

Antonietta Mira contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

A predictive model for planning emergency events rescue during COVID-19 in Lombardy, Italy

Italy, particularly the Lombardy region, was among the first countries outside of Asia to report cases of COVID-19. The emergency medical service called Regional Emergency Agency (AREU) coordinates the intra- and inter-regional non-hospital emergency network and the European emergency number service in Lombardy. AREU must deal with daily and seasonal variations of call volume. The number and type of emergency calls changed dramatically during the COVID-19 pandemic. A model to predict incoming calls and how many of these turn into events, i.e., dispatch of transport and equipment until the rescue is completed, was developed to address the emergency period. We used the generalized additive model with a negative binomial family to predict the number of events one, two, five, and seven days ahead. The over-dispersion of the data was tackled by using the negative binomial family and the nonlinear relationship between the number of events and covariates (e.g., seasonal effects) by smoothing splines. The model coefficients show the effect of variables, e.g., the day of the week, on the number of events and how these effects change during the pre-COVID-19 period. The proposed model returns reasonable mean absolute errors for most of the 2020-2021 period.

preprint2022arXiv

Learning Summary Statistics for Bayesian Inference with Autoencoders

For stochastic models with intractable likelihood functions, approximate Bayesian computation offers a way of approximating the true posterior through repeated comparisons of observations with simulated model outputs in terms of a small set of summary statistics. These statistics need to retain the information that is relevant for constraining the parameters but cancel out the noise. They can thus be seen as thermodynamic state variables, for general stochastic models. For many scientific applications, we need strictly more summary statistics than model parameters to reach a satisfactory approximation of the posterior. Therefore, we propose to use the inner dimension of deep neural network based Autoencoders as summary statistics. To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information on the noise that has been used to generate the training data. We validate the approach empirically on two types of stochastic models.

preprint2022arXiv

On the intrinsic dimensionality of Covid-19 data: a global perspective

This paper aims to develop a global perspective of the complexity of the relationship between the standardised per-capita growth rate of Covid-19 cases, deaths, and the OxCGRT Covid-19 Stringency Index, a measure describing a country's stringency of lockdown policies. To achieve our goal, we use a heterogeneous intrinsic dimension estimator implemented as a Bayesian mixture model, called Hidalgo. We identify that the Covid-19 dataset may project onto two low-dimensional manifolds without significant information loss. The low dimensionality suggests strong dependency among the standardised growth rates of cases and deaths per capita and the OxCGRT Covid-19 Stringency Index for a country over 2020-2021. Given the low dimensional structure, it may be feasible to model observable Covid-19 dynamics with few parameters. Importantly, we identify spatial autocorrelation in the intrinsic dimension distribution worldwide. Moreover, we highlight that high-income countries are more likely to lie on low-dimensional manifolds, likely arising from aging populations, comorbidities, and increased per capita mortality burden from Covid-19. Finally, we temporally stratify the dataset to examine the intrinsic dimension at a more granular level throughout the Covid-19 pandemic.

preprint2022arXiv

Personalized pathology test for Cardio-vascular disease: Approximate Bayesian computation with discriminative summary statistics learning

Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. But recent studies show that the present pathology tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions and are incapable to consider inter-individual variability. Here we propose a stochastic platelet deposition model and an inferential scheme to estimate the biologically meaningful model parameters using approximate Bayesian computation with a summary statistic that maximally discriminates between different types of patients. Inferred parameters from data collected on healthy volunteers and different patient types help us to identify specific biological parameters and hence biological reasoning behind the dysfunction for each type of patients. This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.

preprint2022arXiv

Regularized Zero-Variance Control Variates

Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the number of covariates in the regression rapidly increases with the dimension of the target. In this paper, we present compelling empirical evidence that the use of penalized regression techniques in the selection of high-dimensional control variates provides performance gains over the classical least squares method. Another type of regularization based on using subsets of derivatives, or a priori regularization as we refer to it in this paper, is also proposed to reduce computational and storage requirements. Several examples showing the utility and limitations of regularized ZV-CV for Bayesian inference are given. The methods proposed in this paper are accessible through the R package ZVCV.

preprint2020arXiv

A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data

The use of high-dimensional data for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested Common Atoms Model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice-sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.

preprint2020arXiv

Data segmentation based on the local intrinsic dimension

One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded vs unfolded configurations in a protein molecular dynamics trajectory, active vs non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.

preprint2020arXiv

Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation

The Banana Bunchy Top Virus (BBTV) is one of the most economically important vector-borne banana diseases throughout the Asia-Pacific Basin and presents a significant challenge to the agricultural sector. Current models of BBTV are largely deterministic, limited by an incomplete understanding of interactions in complex natural systems, and the appropriate identification of parameters. A stochastic network-based Susceptible-Infected model has been created which simulates the spread of BBTV across the subsections of a banana plantation, parameterising nodal recovery, neighbouring and distant infectivity across summer and winter. Findings from posterior results achieved through Markov Chain Monte Carlo approach to approximate Bayesian computation suggest seasonality in all parameters, which are influenced by correlated changes in inspection accuracy, temperatures and aphid activity. This paper demonstrates how the model may be used for monitoring and forecasting of various disease management strategies to support policy-level decision making.

preprint2020arXiv

The role of intrinsic dimension in high-resolution player tracking data -- Insights in basketball

A new range of statistical analysis has emerged in sports after the introduction of the high-resolution player tracking technology, specifically in basketball. However, this high dimensional data is often challenging for statistical inference and decision making. In this article, we employ Hidalgo, a state-of-the-art Bayesian mixture model that allows the estimation of heterogeneous intrinsic dimensions (ID) within a dataset and propose some theoretical enhancements. ID results can be interpreted as indicators of variability and complexity of basketball plays and games. This technique allows classification and clustering of NBA basketball player's movement and shot charts data. Analyzing movement data, Hidalgo identifies key stages of offensive actions such as creating space for passing, preparation/shooting and following through. We found that the ID value spikes reaching a peak between 4 and 8 seconds in the offensive part of the court after which it declines. In shot charts, we obtained groups of shots that produce substantially higher and lower successes. Overall, game-winners tend to have a larger intrinsic dimension which is an indication of more unpredictability and unique shot placements. Similarly, we found higher ID values in plays when the score margin is small compared to large margin ones. These outcomes could be exploited by coaches to obtain better offensive/defensive results.