Source author record

Alessandro Provetti

Alessandro Provetti appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph Artificial Intelligence cs.CY Data Structures and Algorithms Logic in Computer Science Machine Learning Cryptography and Security Human-Computer Interaction Information Retrieval Multiagent Systems physics.data-an

Catalog footprint

What is connected

20works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Hybrid Model for Forecasting Short-Term Electricity Demand

Currently the UK Electric market is guided by load (demand) forecasts published every thirty minutes by the regulator. A key factor in predicting demand is weather conditions, with forecasts published every hour. We present HYENA: a hybrid predictive model that combines feature engineering (selection of the candidate predictor features), mobile-window predictors and finally LSTM encoder-decoders to achieve higher accuracy with respect to mainstream models from the literature. HYENA decreased MAPE loss by 16\% and RMSE loss by 10\% over the best available benchmark model, thus establishing a new state of the art for the UK electric load (and price) forecasting.

preprint2022arXiv

Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark

The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport, which are drastically imbalanced with missing attributes sometimes approaching 50\% of the overall data dimensionality. The paper presents the data analysis pipeline starting from the publicly available data of road traffic accidents and ending with predictors of possible injuries and their degree of severity. It addresses the huge incompleteness of public data with a MissForest model. The paper also introduces two baseline approaches to create injury predictors: a supervised artificial neural network and a reinforcement learning model. The dataset can potentially stimulate diverse aspects of machine learning research on imbalanced datasets and the two approaches can be used as baseline references when researchers test more advanced learning algorithms in this area.

preprint2020arXiv

A general centrality framework based on node navigability

Centrality metrics are a popular tool in Network Science to identify important nodes within a graph. We introduce the Potential Gain as a centrality measure that unifies many walk-based centrality metrics in graphs and captures the notion of node navigability, interpreted as the property of being reachable from anywhere else (in the graph) through short walks. Two instances of the Potential Gain (called the Geometric and the Exponential Potential Gain) are presented and we describe scalable algorithms for computing them on large graphs. We also give a proof of the relationship between the new measures and established centralities. The geometric potential gain of a node can thus be characterized as the product of its Degree centrality by its Katz centrality scores. At the same time, the exponential potential gain of a node is proved to be the product of Degree centrality by its Communicability index. These formal results connect potential gain to both the "popularity" and "similarity" properties that are captured by the above centralities.

preprint2020arXiv

COVID-19 Contact Tracing: Eight Privacy Questions Explored

We respond to a recent short paper by de Motjoye et el. on privacy issues with Covid-19 tracking. Their paper, which we discuss here, is structured around three "toy protocols" for the design of an app which can maximise the utility of contact tracing information while minimising the more general risk to privacy. On this basis, the paper proceeds to introduce eight questions against which they should be assessed. The questions raised and the protocols proposed effectively amount to the creation of a game with different categories of players able to make different moves. It is therefore possible to analyse the model in terms of optimal game design.

preprint2020arXiv

Exploring Low-degree Nodes First Accelerates Network Exploration

We consider information diffusion on Web-like networks and how random walks can simulate it. A well-studied problem in this domain is Partial Cover Time, i.e., the calculation of the expected number of steps a random walker needs to visit a given fraction of the nodes of the network. We notice that some of the fastest solutions in fact require that nodes have perfect knowledge of the degree distribution of their neighbors, which in many practical cases is not obtainable, e.g., for privacy reasons. We thus introduce a version of the Cover problem that considers such limitations: Partial Cover Time with Budget. The budget is a limit on the number of neighbors that can be inspected for their degree; we have adapted optimal random walks strategies from the literature to operate under such budget. Our solution is called Min-degree (MD) and, essentially, it biases random walkers towards visiting peripheral areas of the network first. Extensive benchmarking on six real datasets proves that the---perhaps counter-intuitive strategy---MD strategy is in fact highly competitive wrt. state-of-the-art algorithms for cover.

preprint2020arXiv

Potential gain as a centrality measure

Navigability is a distinctive features of graphs associated with artificial or natural systems whose primary goal is the transportation of information or goods. We say that a graph $\mathcal{G}$ is navigable when an agent is able to efficiently reach any target node in $\mathcal{G}$ by means of local routing decisions. In a social network navigability translates to the ability of reaching an individual through personal contacts. Graph navigability is well-studied, but a fundamental question is still open: why are some individuals more likely than others to be reached via short, friend-of-a-friend, communication chains? In this article we answer the question above by proposing a novel centrality metric called the potential gain, which, in an informal sense, quantifies the easiness at which a target node can be reached. We define two variants of the potential gain, called the geometric and the exponential potential gain, and present fast algorithms to compute them. The geometric and the potential gain are the first instances of a novel class of composite centrality metrics, i.e., centrality metrics which combine the popularity of a node in $\mathcal{G}$ with its similarity to all other nodes. As shown in previous studies, popularity and similarity are two main criteria which regulate the way humans seek for information in large networks such as Wikipedia. We give a formal proof that the potential gain of a node is always equivalent to the product of its degree centrality (which captures popularity) and its Katz centrality (which captures similarity).

preprint2015arXiv

A Note on Flagg and Friedman's Epistemic and Intuitionistic Formal Systems

We report our findings on the properties of Flagg and Friedman's translation from Epistemic into Intuitionistic logic, which was proposed as the basis of a comprehensive proof method for the faithfulness of the Goodel translation. We focus on the propositional case and raise the issue of the admissibility of the translated necessitation rule. Then, we contribute to Flagg and Friedman's program by giving an explicit proof of the soundness of their translation.

preprint2015arXiv

Adaptive Search over Sorted Sets

We revisit the classical algorithms for searching over sorted sets to introduce an algorithm refinement, called Adaptive Search, that combines the good features of Interpolation search and those of Binary search. W.r.t. Interpolation search, only a constant number of extra comparisons is introduced. Yet, under diverse input data distributions our algorithm shows costs comparable to that of Interpolation search, i.e., O(log log n) while the worst-case cost is always in O(log n), as with Binary search. On benchmarks drawn from large datasets, both synthetic and real-life, Adaptive search scores better times and lesser memory accesses even than Santoro and Sidney's Interpolation-Binary search.

preprint2015arXiv

MORE: Merged Opinions Reputation Model

Reputation is generally defined as the opinion of a group on an aspect of a thing. This paper presents a reputation model that follows a probabilistic modelling of opinions based on three main concepts: (1) the value of an opinion decays with time, (2) the reputation of the opinion source impacts the reliability of the opinion, and (3) the certainty of the opinion impacts its weight with respect to other opinions. Furthermore, the model is flexible with its opinion sources: it may use explicit opinions or implicit opinions that can be extracted from agent behavior in domains where explicit opinions are sparse. We illustrate the latter with an approach to extract opinions from behavioral information in the sports domain, focusing on football in particular. One of the uses of a reputation model is predicting behavior. We take up the challenge of predicting the behavior of football teams in football matches, which we argue is a very interesting yet difficult approach for evaluating the model.

preprint2015arXiv

Qsmodels: ASP Planning in Interactive Gaming Environment

Qsmodels is a novel application of Answer Set Programming to interactive gaming environment. We describe a software architecture by which the behavior of a bot acting inside the Quake 3 Arena can be controlled by a planner. The planner is written as an Answer Set Program and is interpreted by the Smodels solver.

preprint2015arXiv

RDF annotation of Second Life objects: Knowledge Representation meets Social Virtual reality

We have designed and implemented an application running inside Second Life that supports user annotation of graphical objects and graphical visualization of concept ontologies, thus providing a formal, machine-accessible description of objects. As a result, we offer a platform that combines the graphical knowledge representation that is expected from a MUVE artifact with the semantic structure given by the Resource Framework Description (RDF) representation of information.

preprint2014arXiv

Analysis of a heterogeneous social network of humans and cultural objects

Modern online social platforms enable their members to be involved in a broad range of activities like getting friends, joining groups, posting/commenting resources and so on. In this paper we investigate whether a correlation emerges across the different activities a user can take part in. To perform our analysis we focused on aNobii, a social platform with a world-wide user base of book readers, who like to post their readings, give ratings, review books and discuss them with friends and fellow readers. aNobii presents a heterogeneous structure: i) part social network, with user-to-user interactions, ii) part interest network, with the management of book collections, and iii) part folksonomy, with books that are tagged by the users. We analyzed a complete and anonymized snapshot of aNobii and we focused on three specific activities a user can perform, namely her tagging behavior, her tendency to join groups and her aptitude to compile a wishlist reporting the books she is planning to read. In this way each user is associated with a tag-based, a group-based and a wishlist-based profile. Experimental analysis carried out by means of Information Theory tools like entropy and mutual information suggests that tag-based and group-based profiles are in general more informative than wishlist-based ones. Furthermore, we discover that the degree of correlation between the three profiles associated with the same user tend to be small. Hence, user profiling cannot be reduced to considering just any one type of user activity (although important) but it is crucial to incorporate multiple dimensions to effectively describe users preferences and behavior.

preprint2014arXiv

Characterizing and computing stable models of logic programs: The non-stratified case

Stable Logic Programming (SLP) is an emergent, alternative style of logic programming: each solution to a problem is represented by a stable model of a deductive database/function-free logic program encoding the problem itself. Several implementations now exist for stable logic programming, and their performance is rapidly improving. To make SLP generally applicable, it should be possible to check for consistency (i.e., existence of stable models) of the input program before attempting to answer queries. In the literature, only rather strong sufficient conditions have been proposed for consistency, e.g., stratification. This paper extends these results in several directions. First, the syntactic features of programs, viz. cyclic negative dependencies, affecting the existence of stable models are characterized, and their relevance is discussed. Next, a new graph representation of logic programs, the Extended Dependency Graph (EDG), is introduced, which conveys enough information for reasoning about stable models (while the traditional Dependency Graph does not). Finally, we show that the problem of the existence of stable models can be reformulated in terms of coloring of the EDG.

preprint2014arXiv

On Facebook, most ties are weak

Pervasive socio-technical networks bring new conceptual and technological challenges to developers and users alike. A central research theme is evaluation of the intensity of relations linking users and how they facilitate communication and the spread of information. These aspects of human relationships have been studied extensively in the social sciences under the framework of the "strength of weak ties" theory proposed by Mark Granovetter.13 Some research has considered whether that theory can be extended to online social networks like Facebook, suggesting interaction data can be used to predict the strength of ties. The approaches being used require handling user-generated data that is often not publicly available due to privacy concerns. Here, we propose an alternative definition of weak and strong ties that requires knowledge of only the topology of the social network (such as who is a friend of whom on Facebook), relying on the fact that online social networks, or OSNs, tend to fragment into communities. We thus suggest classifying as weak ties those edges linking individuals belonging to different communities and strong ties as those connecting users in the same community. We tested this definition on a large network representing part of the Facebook social graph and studied how weak and strong ties affect the information-diffusion process. Our findings suggest individuals in OSNs self-organize to create well-connected communities, while weak ties yield cohesion and optimize the coverage of information spread.

preprint2013arXiv

Enhancing community detection using a network weighting strategy

A community within a network is a group of vertices densely connected to each other but less connected to the vertices outside. The problem of detecting communities in large networks plays a key role in a wide range of research areas, e.g. Computer Science, Biology and Sociology. Most of the existing algorithms to find communities count on the topological features of the network and often do not scale well on large, real-life instances. In this article we propose a strategy to enhance existing community detection algorithms by adding a pre-processing step in which edges are weighted according to their centrality w.r.t. the network topology. In our approach, the centrality of an edge reflects its contribute to making arbitrary graph tranversals, i.e., spreading messages over the network, as short as possible. Our strategy is able to effectively complements information about network topology and it can be used as an additional tool to enhance community detection. The computation of edge centralities is carried out by performing multiple random walks of bounded length on the network. Our method makes the computation of edge centralities feasible also on large-scale networks. It has been tested in conjunction with three state-of-the-art community detection algorithms, namely the Louvain method, COPRA and OSLOM. Experimental results show that our method raises the accuracy of existing algorithms both on synthetic and real-life datasets.

preprint2013arXiv

Mixing local and global information for community detection in large networks

The problem of clustering large complex networks plays a key role in several scientific fields ranging from Biology to Sociology and Computer Science. Many approaches to clustering complex networks are based on the idea of maximizing a network modularity function. Some of these approaches can be classified as global because they exploit knowledge about the whole network topology to find clusters. Other approaches, instead, can be interpreted as local because they require only a partial knowledge of the network topology, e.g., the neighbors of a vertex. Global approaches are able to achieve high values of modularity but they do not scale well on large networks and, therefore, they cannot be applied to analyze on-line social networks like Facebook or YouTube. In contrast, local approaches are fast and scale up to large, real-life networks, at the cost of poorer results than those achieved by local methods. In this article we propose a glocal method to maximizing modularity, i.e., our method uses information at the global level, yet its scalability on large networks is comparable to that of local methods. The proposed method is called COmplex Network CLUster DEtection (or, shortly, CONCLUDE.) It works in two stages: in the first stage it uses an information-propagation model, based on random and non-backtracking walks of finite length, to compute the importance of each edge in keeping the network connected (called edge centrality.) Then, edge centrality is used to map network vertices onto points of an Euclidean space and to compute distances between all pairs of connected vertices. In the second stage, CONCLUDE uses the distances computed in the first stage to partition the network into clusters. CONCLUDE is computationally efficient since in the average case its cost is roughly linear in the number of edges of the network.

preprint2012arXiv

Generalized Louvain Method for Community Detection in Large Networks

In this paper we present a novel strategy to discover the community structure of (possibly, large) networks. This approach is based on the well-know concept of network modularity optimization. To do so, our algorithm exploits a novel measure of edge centrality, based on the k-paths. This technique allows to efficiently compute a edge ranking in large networks in near linear time. Once the centrality ranking is calculated, the algorithm computes the pairwise proximity between nodes of the network. Finally, it discovers the community structure adopting a strategy inspired by the well-known state-of-the-art Louvain method (henceforth, LM), efficiently maximizing the network modularity. The experiments we carried out show that our algorithm outperforms other techniques and slightly improves results of the original LM, providing reliable results. Another advantage is that its adoption is naturally extended even to unweighted networks, differently with respect to the LM.

preprint2011arXiv

Crawling Facebook for Social Network Analysis Purposes

We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.

preprint2011arXiv

Improving Recommendation Quality by Merging Collaborative Filtering and Social Relationships

Matrix Factorization techniques have been successfully applied to raise the quality of suggestions generated by Collaborative Filtering Systems (CFSs). Traditional CFSs based on Matrix Factorization operate on the ratings provided by users and have been recently extended to incorporate demographic aspects such as age and gender. In this paper we propose to merge CFS based on Matrix Factorization and information regarding social friendships in order to provide users with more accurate suggestions and rankings on items of their interest. The proposed approach has been evaluated on a real-life online social network; the experimental results show an improvement against existing CFSs. A detailed comparison with related literature is also present.

preprint2004arXiv

Normal forms for Answer Sets Programming

Normal forms for logic programs under stable/answer set semantics are introduced. We argue that these forms can simplify the study of program properties, mainly consistency. The first normal form, called the {\em kernel} of the program, is useful for studying existence and number of answer sets. A kernel program is composed of the atoms which are undefined in the Well-founded semantics, which are those that directly affect the existence of answer sets. The body of rules is composed of negative literals only. Thus, the kernel form tends to be significantly more compact than other formulations. Also, it is possible to check consistency of kernel programs in terms of colorings of the Extended Dependency Graph program representation which we previously developed. The second normal form is called {\em 3-kernel.} A 3-kernel program is composed of the atoms which are undefined in the Well-founded semantics. Rules in 3-kernel programs have at most two conditions, and each rule either belongs to a cycle, or defines a connection between cycles. 3-kernel programs may have positive conditions. The 3-kernel normal form is very useful for the static analysis of program consistency, i.e., the syntactic characterization of existence of answer sets. This result can be obtained thanks to a novel graph-like representation of programs, called Cycle Graph which presented in the companion article \cite{Cos04b}.

Alessandro Provetti

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

A Hybrid Model for Forecasting Short-Term Electricity Demand

Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark

A general centrality framework based on node navigability

COVID-19 Contact Tracing: Eight Privacy Questions Explored

Exploring Low-degree Nodes First Accelerates Network Exploration

Potential gain as a centrality measure

A Note on Flagg and Friedman's Epistemic and Intuitionistic Formal Systems

Adaptive Search over Sorted Sets

MORE: Merged Opinions Reputation Model

Qsmodels: ASP Planning in Interactive Gaming Environment

RDF annotation of Second Life objects: Knowledge Representation meets Social Virtual reality

Analysis of a heterogeneous social network of humans and cultural objects

Characterizing and computing stable models of logic programs: The non-stratified case

On Facebook, most ties are weak

Enhancing community detection using a network weighting strategy

Mixing local and global information for community detection in large networks

Generalized Louvain Method for Community Detection in Large Networks

Crawling Facebook for Social Network Analysis Purposes

Improving Recommendation Quality by Merging Collaborative Filtering and Social Relationships

Normal forms for Answer Sets Programming