Source author record

Francesco Denti

Francesco Denti appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Machine Learning Computation Methodology

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On the intrinsic dimensionality of Covid-19 data: a global perspective

This paper aims to develop a global perspective of the complexity of the relationship between the standardised per-capita growth rate of Covid-19 cases, deaths, and the OxCGRT Covid-19 Stringency Index, a measure describing a country's stringency of lockdown policies. To achieve our goal, we use a heterogeneous intrinsic dimension estimator implemented as a Bayesian mixture model, called Hidalgo. We identify that the Covid-19 dataset may project onto two low-dimensional manifolds without significant information loss. The low dimensionality suggests strong dependency among the standardised growth rates of cases and deaths per capita and the OxCGRT Covid-19 Stringency Index for a country over 2020-2021. Given the low dimensional structure, it may be feasible to model observable Covid-19 dynamics with few parameters. Importantly, we identify spatial autocorrelation in the intrinsic dimension distribution worldwide. Moreover, we highlight that high-income countries are more likely to lie on low-dimensional manifolds, likely arising from aging populations, comorbidities, and increased per capita mortality burden from Covid-19. Finally, we temporally stratify the dataset to examine the intrinsic dimension at a more granular level throughout the Covid-19 pandemic.

preprint2020arXiv

A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data

The use of high-dimensional data for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested Common Atoms Model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice-sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.

preprint2020arXiv

Data segmentation based on the local intrinsic dimension

One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded vs unfolded configurations in a protein molecular dynamics trajectory, active vs non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.

preprint2020arXiv

The role of intrinsic dimension in high-resolution player tracking data -- Insights in basketball

A new range of statistical analysis has emerged in sports after the introduction of the high-resolution player tracking technology, specifically in basketball. However, this high dimensional data is often challenging for statistical inference and decision making. In this article, we employ Hidalgo, a state-of-the-art Bayesian mixture model that allows the estimation of heterogeneous intrinsic dimensions (ID) within a dataset and propose some theoretical enhancements. ID results can be interpreted as indicators of variability and complexity of basketball plays and games. This technique allows classification and clustering of NBA basketball player's movement and shot charts data. Analyzing movement data, Hidalgo identifies key stages of offensive actions such as creating space for passing, preparation/shooting and following through. We found that the ID value spikes reaching a peak between 4 and 8 seconds in the offensive part of the court after which it declines. In shot charts, we obtained groups of shots that produce substantially higher and lower successes. Overall, game-winners tend to have a larger intrinsic dimension which is an indication of more unpredictability and unique shot placements. Similarly, we found higher ID values in plays when the score margin is small compared to large margin ones. These outcomes could be exploited by coaches to obtain better offensive/defensive results.

Francesco Denti

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

On the intrinsic dimensionality of Covid-19 data: a global perspective

A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data

Data segmentation based on the local intrinsic dimension

The role of intrinsic dimension in high-resolution player tracking data -- Insights in basketball