Researcher profile

Gianmarco De Francisci Morales

Gianmarco De Francisci Morales contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2022arXiv

FreSCo: Mining Frequent Patterns in Simplicial Complexes

Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that finding the occurrences of simplets in a complex can be reduced to a bipartite graph isomorphism problem, in linear time and at most quadratic space. We then propose an anti-monotonic frequency measure that allows us to start the exploration from small simplets and stop expanding a simplet as soon as its frequency falls below the minimum frequency threshold. Equipped with these ideas and a clever data structure, we develop a memory-conscious algorithm that, by carefully exploiting the relationships among the simplices in the complex and among the simplets, achieves efficiency and scalability for our complex mining task. Our algorithm, FreSCo, comes in two flavors: it can compute the exact frequency of the simplets or, more quickly, it can determine whether a simplet is frequent, without having to compute the exact frequency. Experimental results prove the ability of FreSCo to mine frequent simplets in complexes of various size and dimension, and the significance of the simplets with respect to the traditional graph patterns.

preprint2022arXiv

On the Relation Between Opinion Change and Information Consumption on Reddit

While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopts a well-codified schema for explicitly self-reporting opinion change. Starting from this ground truth, we analyze changes in future online information consumption behavior that arise after a self-reported opinion change on sociopolitical topics; and in particular, operationalized in this work as the participation to sociopolitical subreddits. Such participation profile is important as it represents one's information diet, and is a reliable proxy for, e.g., political affiliation or health choices. We find that people who report an opinion change are significantly more likely to change their future participation in a specific subset of online communities. We characterize which communities are more likely to be abandoned after opinion change, and find a significant association (r=0.46) between propaganda-like language used in a community and the increase in chances of leaving it. We find comparable results (r=0.39) for the opposite direction, i.e., joining a community. This finding suggests how propagandistic communities act as a first gateway to internalize a shift in one's sociopolitical opinion. Finally, we show that the textual content of the discussion associated with opinion change is indicative of which communities are going to be subject to a participation change. In fact, a predictive model based only on the opinion change post is able to pinpoint these communities with an AP@5 of 0.20, similar to what can be reached by using all the past history of participation in communities.

preprint2021arXiv

No Echo in the Chambers of Political Interactions on Reddit

Echo chambers in online social networks, whereby users' beliefs are reinforced by interactions with like-minded peers and insulation from others' points of view, have been decried as a cause of political polarization. Here, we investigate their role in the debate around the 2016 US elections on Reddit, a fundamental platform for the success of Donald Trump. We identify Trump vs Clinton supporters and reconstruct their political interaction network. We observe a preference for cross-cutting political interactions between the two communities rather than within-group interactions, thus contradicting the echo chamber narrative. Furthermore, these interactions are asymmetrical: Clinton supporters are particularly eager to answer comments by Trump supporters. Beside asymmetric heterophily, users show assortative behavior for activity, and disassortative, asymmetric behavior for popularity. Our findings are tested against a null model of random interactions, by using two different approaches: a network rewiring which preserves the activity of nodes, and a logit regression which takes into account possible confounding factors. Finally, we explore possible socio-demographic implications. Users show a tendency for geographical homophily and a small positive correlation between cross-interactions and voter abstention. Our findings shed light on public opinion formation on social media, calling for a better understanding of the social dynamics at play in this context.

preprint2021arXiv

STruD: Truss Decomposition of Simplicial Complexes

A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to the graph-based one. This power, however, comes with important computational challenges derived from the combinatorial explosion caused by the downward closure property of complexes. Drawing upon ideas from itemset mining and similarity search, we design a memory-aware algorithm, dubbed STruD, which is able to efficiently compute the truss decomposition of a simplicial complex. STruD adapts its behavior to the amount of available memory by storing intermediate data in a compact way. We then devise a variant that computes directly the n simplices of maximum trussness. By applying STruD to several datasets, we prove its scalability, and provide an analysis of their structure. Finally, we show that the truss decomposition can be seen as a filtration, and as such it can be used to study the persistent homology of a dataset, a method for computing topological features at different spatial resolutions, prominent in Topological Data Analysis.

preprint2020arXiv

Aion: Better Late than Never in Event-Time Streams

Processing data streams in near real-time is an increasingly important task. In the case of event-timestamped data, the stream processing system must promptly handle late events that arrive after the corresponding window has been processed. To enable this late processing, the window state must be maintained for a long period of time. However, current systems maintain this state in memory, which either imposes a maximum period of tolerated lateness, or causes the system to degrade performance or even crash when the system memory runs out. In this paper, we propose AION, a comprehensive solution for handling late events in an efficient manner, implemented on top of Flink. In designing AION, we go beyond a naive solution that transfers state between memory and persistent storage on demand. In particular, we introduce a proactive caching scheme, where we leverage the semantics of stream processing to anticipate the need for bringing data to memory. Furthermore, we propose a predictive cleanup scheme to permanently discard window state based on the likelihood of receiving more late events, to prevent storage consumption from growing without bounds. Our evaluation shows that AION is capable of maintaining sustainable levels of memory utilization while still preserving high throughput, low latency, and low staleness.

preprint2020arXiv

Echo Chambers on Social Media: A comparative analysis

Recent studies have shown that online users tend to select information adhering to their system of beliefs, ignore information that does not, and join groups - i.e., echo chambers - around a shared narrative. Although a quantitative methodology for their identification is still missing, the phenomenon of echo chambers is widely debated both at scientific and political level. To shed light on this issue, we introduce an operational definition of echo chambers and perform a massive comparative analysis on more than 1B pieces of contents produced by 1M users on four social media platforms: Facebook, Twitter, Reddit, and Gab. We infer the leaning of users about controversial topics - ranging from vaccines to abortion - and reconstruct their interaction networks by analyzing different features, such as shared links domain, followed pages, follower relationship and commented posts. Our method quantifies the existence of echo-chambers along two main dimensions: homophily in the interaction networks and bias in the information diffusion toward likely-minded peers. We find peculiar differences across social media. Indeed, while Facebook and Twitter present clear-cut echo chambers in all the observed dataset, Reddit and Gab do not. Finally, we test the role of the social media platform on news consumption by comparing Reddit and Facebook. Again, we find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.

preprint2020arXiv

Falling into the Echo Chamber: the Italian Vaccination Debate on Twitter

The reappearance of measles in the US and Europe, a disease considered eliminated in early 2000s, has been accompanied by a growing debate on the merits of vaccination on social media. In this study we examine the extent to which the vaccination debate on Twitter is conductive to potential outreach to the vaccination hesitant. We focus on Italy, one of the countries most affected by the latest measles outbreaks. We discover that the vaccination skeptics, as well as the advocates, reside in their own distinct "echo chambers". The structure of these communities differs as well, with skeptics arranged in a tightly connected cluster, and advocates organizing themselves around few authoritative hubs. At the center of these echo chambers we find the ardent supporters, for which we build highly accurate network- and content-based classifiers (attaining 95% cross-validated accuracy). Insights of this study provide several avenues for potential future interventions, including network-guided targeting, accounting for the political context, and monitoring of alternative sources of information.

preprint2020arXiv

Learning Opinion Dynamics From Social Traces

Opinion dynamics - the research field dealing with how people's opinions form and evolve in a social context - traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, their predictive power is limited. Moreover, parameter calibration and model selection are manual and difficult tasks. In this work we propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces. Given a set of observables (e.g., actions and interactions between agents), our model can recover the most-likely latent opinion trajectories that are compatible with the assumptions about the process dynamics. This type of model retains the benefits of agent-based ones (i.e., causal interpretation), while adding the ability to perform model selection and hypothesis testing on real data. We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart. We then design an inference algorithm based on online expectation maximization to learn the latent parameters of the model. Such algorithm can recover the latent opinion trajectories from traces generated by the classical agent-based model. In addition, it can identify the most likely set of macro parameters used to generate a data trace, thus allowing testing of sociological hypotheses. Finally, we apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect. Our results suggest a low prominence of the effect in Reddit's political conversation.

preprint2020arXiv

Link Prediction via Higher-Order Motif Features

Link prediction requires predicting which new links are likely to appear in a graph. Being able to predict unseen links with good accuracy has important applications in several domains such as social media, security, transportation, and recommendation systems. A common approach is to use features based on the common neighbors of an unconnected pair of nodes to predict whether the pair will form a link in the future. In this paper, we present an approach for link prediction that relies on higher-order analysis of the graph topology, well beyond common neighbors. We treat the link prediction problem as a supervised classification problem, and we propose a set of features that depend on the patterns or motifs that a pair of nodes occurs in. By using motifs of sizes 3, 4, and 5, our approach captures a high level of detail about the graph topology within the neighborhood of the pair of nodes, which leads to a higher classification accuracy. In addition to proposing the use of motif-based features, we also propose two optimizations related to constructing the classification dataset from the graph. First, to ensure that positive and negative examples are treated equally when extracting features, we propose adding the negative examples to the graph as an alternative to the common approach of removing the positive ones. Second, we show that it is important to control for the shortest-path distance when sampling pairs of nodes to form negative examples, since the difficulty of prediction varies with the shortest-path distance. We experimentally demonstrate that using off-the-shelf classifiers with a well constructed classification dataset results in up to 10 percentage points increase in accuracy over prior topology-based and feature learning methods.

preprint2020arXiv

Roots of Trumpism: Homophily and Social Feedback in Donald Trump Support on Reddit

We study the emergence of support for Donald Trump in Reddit's political discussion. With almost 800k subscribers, "r/The_Donald" is one of the largest communities on Reddit, and one of the main hubs for Trump supporters. It was created in 2015, shortly after Donald Trump began his presidential campaign. By using only data from 2012, we predict the likelihood of being a supporter of Donald Trump in 2016, the year of the last US presidential elections. To characterize the behavior of Trump supporters, we draw from three different sociological hypotheses: homophily, social influence, and social feedback. We operationalize each hypothesis as a set of features for each user, and train classifiers to predict their participation in r/The_Donald. We find that homophily-based and social feedback-based features are the most predictive signals. Conversely, we do not observe a strong impact of social influence mechanisms. We also perform an introspection of the best-performing model to build a "persona" of the typical supporter of Donald Trump on Reddit. We find evidence that the most prominent traits include a predominance of masculine interests, a conservative and libertarian political leaning, and links with politically incorrect and conspiratorial content.

preprint2011arXiv

Social content matching in MapReduce

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that StackMR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.