Researcher profile

Alessandro Provetti

Alessandro Provetti contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

A Hybrid Model for Forecasting Short-Term Electricity Demand

Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. A key factor in predicting demand is weather conditions, with forecasts published every hour. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and finally LSTM encoder-decoders to achieve higher accuracy with respect to mainstream models from the literature. HYENA decreased MAPE loss by 16\% and RMSE loss by 10\% over the best available benchmark model, thus establishing a new state of the art for the UK electric load (and price) forecasting.

preprint2022arXiv

Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark

The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport, which are drastically imbalanced with missing attributes sometimes approaching 50\% of the overall data dimensionality. The paper presents the data analysis pipeline starting from the publicly available data of road traffic accidents and ending with predictors of possible injuries and their degree of severity. It addresses the huge incompleteness of public data with a MissForest model. The paper also introduces two baseline approaches to create injury predictors: a supervised artificial neural network and a reinforcement learning model. The dataset can potentially stimulate diverse aspects of machine learning research on imbalanced datasets and the two approaches can be used as baseline references when researchers test more advanced learning algorithms in this area.

preprint2020arXiv

A general centrality framework based on node navigability

Centrality metrics are a popular tool in Network Science to identify important nodes within a graph. We introduce the Potential Gain as a centrality measure that unifies many walk-based centrality metrics in graphs and captures the notion of node navigability, interpreted as the property of being reachable from anywhere else (in the graph) through short walks. Two instances of the Potential Gain (called the Geometric and the Exponential Potential Gain) are presented and we describe scalable algorithms for computing them on large graphs. We also give a proof of the relationship between the new measures and established centralities. The geometric potential gain of a node can thus be characterized as the product of its Degree centrality by its Katz centrality scores. At the same time, the exponential potential gain of a node is proved to be the product of Degree centrality by its Communicability index. These formal results connect potential gain to both the "popularity" and "similarity" properties that are captured by the above centralities.

preprint2020arXiv

COVID-19 Contact Tracing: Eight Privacy Questions Explored

We respond to a recent short paper by de Motjoye et el. on privacy issues with Covid-19 tracking. Their paper, which we discuss here, is structured around three "toy protocols" for the design of an app which can maximise the utility of contact tracing information while minimising the more general risk to privacy. On this basis, the paper proceeds to introduce eight questions against which they should be assessed. The questions raised and the protocols proposed effectively amount to the creation of a game with different categories of players able to make different moves. It is therefore possible to analyse the model in terms of optimal game design.

preprint2020arXiv

Exploring Low-degree Nodes First Accelerates Network Exploration

We consider information diffusion on Web-like networks and how random walks can simulate it. A well-studied problem in this domain is Partial Cover Time, i.e., the calculation of the expected number of steps a random walker needs to visit a given fraction of the nodes of the network. We notice that some of the fastest solutions in fact require that nodes have perfect knowledge of the degree distribution of their neighbors, which in many practical cases is not obtainable, e.g., for privacy reasons. We thus introduce a version of the Cover problem that considers such limitations: Partial Cover Time with Budget. The budget is a limit on the number of neighbors that can be inspected for their degree; we have adapted optimal random walks strategies from the literature to operate under such budget. Our solution is called Min-degree (MD) and, essentially, it biases random walkers towards visiting peripheral areas of the network first. Extensive benchmarking on six real datasets proves that the---perhaps counter-intuitive strategy---MD strategy is in fact highly competitive wrt. state-of-the-art algorithms for cover.

preprint2020arXiv

Potential gain as a centrality measure

Navigability is a distinctive features of graphs associated with artificial or natural systems whose primary goal is the transportation of information or goods. We say that a graph $\mathcal{G}$ is navigable when an agent is able to efficiently reach any target node in $\mathcal{G}$ by means of local routing decisions. In a social network navigability translates to the ability of reaching an individual through personal contacts. Graph navigability is well-studied, but a fundamental question is still open: why are some individuals more likely than others to be reached via short, friend-of-a-friend, communication chains? In this article we answer the question above by proposing a novel centrality metric called the potential gain, which, in an informal sense, quantifies the easiness at which a target node can be reached. We define two variants of the potential gain, called the geometric and the exponential potential gain, and present fast algorithms to compute them. The geometric and the potential gain are the first instances of a novel class of composite centrality metrics, i.e., centrality metrics which combine the popularity of a node in $\mathcal{G}$ with its similarity to all other nodes. As shown in previous studies, popularity and similarity are two main criteria which regulate the way humans seek for information in large networks such as Wikipedia. We give a formal proof that the potential gain of a node is always equivalent to the product of its degree centrality (which captures popularity) and its Katz centrality (which captures similarity).

preprint2012arXiv

Generalized Louvain Method for Community Detection in Large Networks

In this paper we present a novel strategy to discover the community structure of (possibly, large) networks. This approach is based on the well-know concept of network modularity optimization. To do so, our algorithm exploits a novel measure of edge centrality, based on the k-paths. This technique allows to efficiently compute a edge ranking in large networks in near linear time. Once the centrality ranking is calculated, the algorithm computes the pairwise proximity between nodes of the network. Finally, it discovers the community structure adopting a strategy inspired by the well-known state-of-the-art Louvain method (henceforth, LM), efficiently maximizing the network modularity. The experiments we carried out show that our algorithm outperforms other techniques and slightly improves results of the original LM, providing reliable results. Another advantage is that its adoption is naturally extended even to unweighted networks, differently with respect to the LM.

preprint2011arXiv

Crawling Facebook for Social Network Analysis Purposes

We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.