Source author record

Emilio Ferrara

Emilio Ferrara appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

53works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Gendered Pathways in AI Companionship: Cross-Community Behavior and Toxicity Patterns on Reddit

AI-companionship platforms are rapidly reshaping how people form emotional, romantic, and parasocial bonds with non-human agents, raising new questions about how these relationships intersect with gendered online behavior and exposure to harmful content. Focusing on the MyBoyfriendIsAI (MBIA) subreddit, we reconstruct the Reddit activity histories of more than 3,000 highly engaged users over two years, yielding over 67,000 historical submissions. We then situate MBIA within a broader ecosystem by building a historical interaction network spanning more than 2,000 subreddits, which enables us to trace cross-community pathways and measure how toxicity and emotional expression vary across these trajectories. We find that MBIA users primarily traverse four surrounding community spheres (AI-companionship, porn-related, forum-like, and gaming) and that participation across the ecosystem exhibits a distinct gendered structure, with substantial engagement by female users. While toxicity is generally low across most pathways, we observe localized spikes concentrated in a small subset of AI-porn and gender-oriented communities. Nearly 16% of users engage with gender-focused subreddits, and their trajectories display systematically different patterns of emotional expression and elevated toxicity, suggesting that a minority of gendered pathways may act as toxicity amplifiers within the broader AI-companionship ecosystem. These results characterize the gendered structure of cross-community participation around AI companionship on Reddit and highlight where risks concentrate, informing measurement, moderation, and design practices for human-AI relationship platforms.

preprint2026arXiv

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber simulation. Across multiple models, minor perturbations in persona format and game-instruction framing shift cooperation rates by up to 76 percentage points, while network homophily and hub assignment produce significant and consistent shifts in polarization metrics. We also find that sensitivity is unevenly distributed across both architectural choices and model families: the same perturbation that produces the 76 pp shift in one frontier model only shifts another by 1 pp. Robustness is therefore a property that should be measured per claim and per model, not assumed. To address this validation gap, we introduce TRAILS (Taxonomy for Robustness Audits In LLM Simulations), a robustness-audit taxonomy spanning three levels of simulation design: agent (micro-level), interaction (meso-level), and system (macro-level). We call for robustness to become a first-order validation requirement before LLM social simulations are used to explain mechanisms, evaluate interventions, or inform decisions.

preprint2026arXiv

The Generative AI Paradox: GenAI and the Erosion of Trust, the Corrosion of Information Verification, and the Demise of Truth

Generative AI (GenAI) now produces text, images, audio, and video that can be perceptually convincing at scale and at negligible marginal cost. While public debate often frames the associated harms as "deepfakes" or incremental extensions of misinformation and fraud, this view misses a broader socio-technical shift: GenAI enables synthetic realities; coherent, interactive, and potentially personalized information environments in which content, identity, and social interaction are jointly manufactured and mutually reinforcing. We argue that the most consequential risk is not merely the production of isolated synthetic artifacts, but the progressive erosion of shared epistemic ground and institutional verification practices as synthetic content, synthetic identity, and synthetic interaction become easy to generate and hard to audit. This paper (i) formalizes synthetic reality as a layered stack (content, identity, interaction, institutions), (ii) expands a taxonomy of GenAI harms spanning personal, economic, informational, and socio-technical risks, (iii) articulates the qualitative shifts introduced by GenAI (cost collapse, throughput, customization, micro-segmentation, provenance gaps, and trust erosion), and (iv) synthesizes recent risk realizations (2023-2025) into a compact case bank illustrating how these mechanisms manifest in fraud, elections, harassment, documentation, and supply-chain compromise. We then propose a mitigation stack that treats provenance infrastructure, platform governance, institutional workflow redesign, and public resilience as complementary rather than substitutable, and outline a research agenda focused on measuring epistemic security. We conclude with the Generative AI Paradox: as synthetic media becomes ubiquitous, societies may rationally discount digital evidence altogether.

preprint2023arXiv

Social Bots: Detection and Challenges

While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online discussions to show their real-world implications and the need for detection methods. Then we discuss the challenges of bot detection methods and use Botometer, a publicly available bot detection tool, as a case study to describe recent developments in this area. We close with a practical guide on how to handle social bots in social media research.

preprint2023arXiv

Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data

The proliferation of social network data has unlocked unprecedented opportunities for extensive, data-driven exploration of human behavior. The structural intricacies of social networks offer insights into various computational social science issues, particularly concerning social influence and information diffusion. However, modeling large-scale social network data comes with computational challenges. Though large language models make it easier than ever to model textual content, any advanced network representation methods struggle with scalability and efficient deployment to out-of-sample users. In response, we introduce a novel approach tailored for modeling social network data in user detection tasks. This innovative method integrates localized social network interactions with the capabilities of large language models. Operating under the premise of social network homophily, which posits that socially connected users share similarities, our approach is designed to address these challenges. We conduct a thorough evaluation of our method across seven real-world social network datasets, spanning a diverse range of topics and detection tasks, showcasing its applicability to advance research in computational social science.

preprint2022arXiv

Botometer 101: Social bot practicum for computational social scientists

Social bots have become an important component of online social media. Deceptive bots, in particular, can manipulate online discussions of important issues ranging from elections to public health, threatening the constructive exchange of information. Their ubiquity makes them an interesting research subject and requires researchers to properly handle them when conducting studies using social media data. Therefore, it is important for researchers to gain access to bot detection tools that are reliable and easy to use. This paper aims to provide an introductory tutorial of Botometer, a public tool for bot detection on Twitter, for readers who are new to this topic and may not be familiar with programming and machine learning. We introduce how Botometer works, the different ways users can access it, and present a case study as a demonstration. Readers can use the case study code as a template for their own research. We also discuss recommended practice for using Botometer.

preprint2022arXiv

Construction of Large-Scale Misinformation Labeled Datasets from Social Media Discourse using Label Refinement

Malicious accounts spreading misinformation has led to widespread false and misleading narratives in recent times, especially during the COVID-19 pandemic, and social media platforms struggle to eliminate these contents rapidly. This is because adapting to new domains requires human intensive fact-checking that is slow and difficult to scale. To address this challenge, we propose to leverage news-source credibility labels as weak labels for social media posts and propose model-guided refinement of labels to construct large-scale, diverse misinformation labeled datasets in new domains. The weak labels can be inaccurate at the article or social media post level where the stance of the user does not align with the news source or article credibility. We propose a framework to use a detection model self-trained on the initial weak labels with uncertainty sampling based on entropy in predictions of the model to identify potentially inaccurate labels and correct for them using self-supervision or relabeling. The framework will incorporate social context of the post in terms of the community of its associated user for surfacing inaccurate labels towards building a large-scale dataset with minimum human effort. To provide labeled datasets with distinction of misleading narratives where information might be missing significant context or has inaccurate ancillary details, the proposed framework will use the few labeled samples as class prototypes to separate high confidence samples into false, unproven, mixture, mostly false, mostly true, true, and debunk information. The approach is demonstrated for providing a large-scale misinformation dataset on COVID-19 vaccines.

preprint2022arXiv

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment

Human-centered AI considers human experiences with AI performance. While abundant research has been helping AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum feedback. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications that require millions of samples without a server. The result demonstrates the effectiveness of an interactive curriculum for reinforcement learning involving human-in-the-loop. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level. We believe this research will open new doors for achieving flow and personalized adaptive difficulties.

preprint2022arXiv

Individual and Collective Performance Deteriorate in a New Team: A Case Study of CS:GO Tournaments

How does the team formation relates to team performance in professional video game playing? This study examined one aspect of group dynamics - team switching - and aims to answer how changing a team affects individual and collective performance in eSports tournaments. In this study we test the hypothesis that switching teams can be detrimental to individual and team performance both in short term and in a long run. We collected data from professional tournaments of a popular first-person shooter game {\itshape Counter-Strike: Global Offensive (CS:GO)} and perform two natural experiments. We found that the player's performance was inversely correlated with the number of teams a player had joined. After a player switched to a new team, both the individual and the collective performance dropped initially, and then slowly recovered. The findings in this study can provide insights for understanding group dynamics in eSports team play and eventually emphasize the importance of team cohesion in facilitating team collaboration, coordination, and knowledge sharing in teamwork in general.

preprint2021arXiv

Individualized Context-Aware Tensor Factorization for Online Games Predictions

Individual behavior and decisions are substantially influenced by their contexts, such as location, environment, and time. Changes along these dimensions can be readily observed in Multiplayer Online Battle Arena games (MOBA), where players face different in-game settings for each match and are subject to frequent game patches. Existing methods utilizing contextual information generalize the effect of a context over the entire population, but contextual information tailored to each individual can be more effective. To achieve this, we present the Neural Individualized Context-aware Embeddings (NICE) model for predicting user performance and game outcomes. Our proposed method identifies individual behavioral differences in different contexts by learning latent representations of users and contexts through non-negative tensor factorization. Using a dataset from the MOBA game League of Legends, we demonstrate that our model substantially improves the prediction of winning outcome, individual user performance, and user engagement.

preprint2021arXiv

Social Bots and Social Media Manipulation in 2020: The Year in Review

The year 2020 will be remembered for two events of global significance: the COVID-19 pandemic and 2020 U.S. Presidential Election. In this chapter, we summarize recent studies using large public Twitter data sets on these issues. We have three primary objectives. First, we delineate epistemological and practical considerations when combining the traditions of computational research and social science research. A sensible balance should be struck when the stakes are high between advancing social theory and concrete, timely reporting of ongoing events. We additionally comment on the computational challenges of gleaning insight from large amounts of social media data. Second, we characterize the role of social bots in social media manipulation around the discourse on the COVID-19 pandemic and 2020 U.S. Presidential Election. Third, we compare results from 2020 to prior years to note that, although bot accounts still contribute to the emergence of echo-chambers, there is a transition from state-sponsored campaigns to domestically emergent sources of distortion. Furthermore, issues of public health can be confounded by political orientation, especially from localized communities of actors who spread misinformation. We conclude that automation and social media manipulation pose issues to a healthy and democratic discourse, precisely because they distort representation of pluralism within the public sphere.

preprint2021arXiv

Tracking e-cigarette warning label compliance on Instagram with deep learning

The U.S. Food & Drug Administration (FDA) requires that e-cigarette advertisements include a prominent warning label that reminds consumers that nicotine is addictive. However, the high volume of vaping-related posts on social media makes compliance auditing expensive and time-consuming, suggesting that an automated, scalable method is needed. We sought to develop and evaluate a deep learning system designed to automatically determine if an Instagram post promotes vaping, and if so, if an FDA-compliant warning label was included or if a non-compliant warning label was visible in the image. We compiled and labeled a dataset of 4,363 Instagram images, of which 44% were vaping-related, 3% contained FDA-compliant warning labels, and 4% contained non-compliant labels. Using a 20% test set for evaluation, we tested multiple neural network variations: image processing backbone model (Inceptionv3, ResNet50, EfficientNet), data augmentation, progressive layer unfreezing, output bias initialization designed for class imbalance, and multitask learning. Our final model achieved an area under the curve (AUC) and [accuracy] of 0.97 [92%] on vaping classification, 0.99 [99%] on FDA-compliant warning labels, and 0.94 [97%] on non-compliant warning labels. We conclude that deep learning models can effectively identify vaping posts on Instagram and track compliance with FDA warning label requirements.

preprint2020arXiv

Charting the Landscape of Online Cryptocurrency Manipulation

Cryptocurrencies represent one of the most attractive markets for financial speculation. As a consequence, they have attracted unprecedented attention on social media. Besides genuine discussions and legitimate investment initiatives, several deceptive activities have flourished. In this work, we chart the online cryptocurrency landscape across multiple platforms. To reach our goal, we collected a large dataset, composed of more than 50M messages published by almost 7M users on Twitter, Telegram and Discord, over three months. We performed bot detection on Twitter accounts sharing invite links to Telegram and Discord channels, and we discovered that more than 56% of them were bots or suspended accounts. Then, we applied topic modeling techniques to Telegram and Discord messages, unveiling two different deception schemes - "pump-and-dump" and "Ponzi" - and identifying the channels involved in these frauds. Whereas on Discord we found a negligible level of deception, on Telegram we retrieved 296 channels involved in pump-and-dump and 432 involved in Ponzi schemes, accounting for a striking 20% of the total. Moreover, we observed that 93% of the invite links shared by Twitter bots point to Telegram pump-and-dump channels, shedding light on a little-known social bot activity. Charting the landscape of online cryptocurrency manipulation can inform actionable policies to fight such abuse.

preprint2020arXiv

Detecting multi-timescale consumption patterns from receipt data: A non-negative tensor factorization approach

Understanding consumer behavior is an important task, not only for developing marketing strategies but also for the management of economic policies. Detecting consumption patterns, however, is a high-dimensional problem in which various factors that would affect consumers' behavior need to be considered, such as consumers' demographics, circadian rhythm, seasonal cycles, etc. Here, we develop a method to extract multi-timescale expenditure patterns of consumers from a large dataset of scanned receipts. We use a non-negative tensor factorization (NTF) to detect intra- and inter-week consumption patterns at one time. The proposed method allows us to characterize consumers based on their consumption patterns that are correlated over different timescales.

preprint2020arXiv

Detecting Troll Behavior via Inverse Reinforcement Learning: A Case Study of Russian Trolls in the 2016 US Election

Since the 2016 US Presidential election, social media abuse has been eliciting massive concern in the academic community and beyond. Preventing and limiting the malicious activity of users, such as trolls and bots, in their manipulation campaigns is of paramount importance for the integrity of democracy, public health, and more. However, the automated detection of troll accounts is an open challenge. In this work, we propose an approach based on Inverse Reinforcement Learning (IRL) to capture troll behavior and identify troll accounts. We employ IRL to infer a set of online incentives that may steer user behavior, which in turn highlights behavioral differences between troll and non-troll accounts, enabling their accurate classification. As a study case, we consider the troll accounts identified by the US Congress during the investigation of Russian meddling in the 2016 US Presidential election. We report promising results: the IRL-based approach is able to accurately detect troll accounts (AUC=89.1%). The differences in the predictive features between the two classes of accounts enables a principled understanding of the distinctive behaviors reflecting the incentives trolls and non-trolls respond to.

preprint2020arXiv

Learning Behavioral Representations from Wearable Sensors

Continuous collection of physiological data from wearable sensors enables temporal characterization of individual behaviors. Understanding the relation between an individual's behavioral patterns and psychological states can help identify strategies to improve quality of life. One challenge in analyzing physiological data is extracting the underlying behavioral states from the temporal sensor signals and interpreting them. Here, we use a non-parametric Bayesian approach to model sensor data from multiple people and discover the dynamic behaviors they share. We apply this method to data collected from sensors worn by a population of hospital workers and show that the learned states can cluster participants into meaningful groups and better predict their cognitive and psychological states. This method offers a way to learn interpretable compact behavioral representations from multivariate sensor signals.

preprint2020arXiv

Learning to Reason in Round-based Games: Multi-task Sequence Generation for Purchasing Decision Making in First-person Shooters

Sequential reasoning is a complex human ability, with extensive previous research focusing on gaming AI in a single continuous game, round-based decision makings extending to a sequence of games remain less explored. Counter-Strike: Global Offensive (CS:GO), as a round-based game with abundant expert demonstrations, provides an excellent environment for multi-player round-based sequential reasoning. In this work, we propose a Sequence Reasoner with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies behind the round-based purchasing decisions. We adopt few-shot learning to sample multiple rounds in a match, and modified model agnostic meta-learning algorithm Reptile for the meta-learning loop. We formulate each round as a multi-task sequence generation problem. Our state representations combine action encoder, team encoder, player features, round attribute encoder, and economy encoders to help our agent learn to reason under this specific multi-player round-based scenario. A complete ablation study and comparison with the greedy approach certify the effectiveness of our model. Our research will open doors for interpretable AI for understanding episodic and long-term purchasing strategies beyond the gaming community.

preprint2020arXiv

Leveraging Clickstream Trajectories to Reveal Low-Quality Workers in Crowdsourced Forecasting Platforms

Crowdwork often entails tackling cognitively-demanding and time-consuming tasks. Crowdsourcing can be used for complex annotation tasks, from medical imaging to geospatial data, and such data powers sensitive applications, such as health diagnostics or autonomous driving. However, the existence and prevalence of underperforming crowdworkers is well-recognized, and can pose a threat to the validity of crowdsourcing. In this study, we propose the use of a computational framework to identify clusters of underperforming workers using clickstream trajectories. We focus on crowdsourced geopolitical forecasting. The framework can reveal different types of underperformers, such as workers with forecasts whose accuracy is far from the consensus of the crowd, those who provide low-quality explanations for their forecasts, and those who simply copy-paste their forecasts from other users. Our study suggests that clickstream clustering and analysis are fundamental tools to diagnose the performance of crowdworkers in platforms leveraging the wisdom of crowds.

preprint2020arXiv

Predictability limit of partially observed systems

Applications from finance to epidemiology and cyber-security require accurate forecasts of dynamic phenomena, which are often only partially observed. We demonstrate that a system's predictability degrades as a function of temporal sampling, regardless of the adopted forecasting model. We quantify the loss of predictability due to sampling, and show that it cannot be recovered by using external signals. We validate the generality of our theoretical findings in real-world partially observed systems representing infectious disease outbreaks, online discussions, and software development projects. On a variety of prediction tasks---forecasting new infections, the popularity of topics in online discussions, or interest in cryptocurrency projects---predictability irrecoverably decays as a function of sampling, unveiling fundamental predictability limits in partially observed systems.

preprint2020arXiv

ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

First identified in Wuhan, China, in December 2019, the outbreak of COVID-19 has been declared as a global emergency in January, and a pandemic in March 2020 by the World Health Organization (WHO). Along with this pandemic, we are also experiencing an "infodemic" of information with low credibility such as fake news and conspiracies. In this work, we present ReCOVery, a repository designed and constructed to facilitate research on combating such information regarding COVID-19. We first broadly search and investigate ~2,000 news publishers, from which 60 are identified with extreme [high or low] levels of credibility. By inheriting the credibility of the media on which they were published, a total of 2,029 news articles on coronavirus, published from January to May 2020, are collected in the repository, along with 140,820 tweets that reveal how these news articles have spread on the Twitter social network. The repository provides multimodal information of news articles on coronavirus, including textual, visual, temporal, and network information. The way that news credibility is obtained allows a trade-off between dataset scalability and label accuracy. Extensive experiments are conducted to present data statistics and distributions, as well as to provide baseline performances for predicting news credibility so that future methods can be compared. Our repository is available at http://coronavirus-fakenews.com.

preprint2020arXiv

Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set

At the time of this writing, the novel coronavirus (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much conversation about these phenomena now occurs online, e.g., on social media platforms like Twitter. In this paper, we describe a multilingual coronavirus (COVID-19) Twitter dataset that we have been continuously collecting since January 22, 2020. We are making our dataset available to the research community (https://github.com/echen102/COVID-19-TweetIDs). It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This dataset could also help track scientific coronavirus misinformation and unverified rumors, or enable the understanding of fear and panic -- and undoubtedly more. Ultimately, this dataset may contribute towards enabling informed solutions and prescribing targeted policy interventions to fight this global crisis.

preprint2020arXiv

What Types of COVID-19 Conspiracies are Populated by Twitter Bots?

With people moving out of physical public spaces due to containment measures to tackle the novel coronavirus (COVID-19) pandemic, online platforms become even more prominent tools to understand social discussion. Studying social media can be informative to assess how we are collectively coping with this unprecedented global crisis. However, social media platforms are also populated by bots, automated accounts that can amplify certain topics of discussion at the expense of others. In this paper, we study 43.3M English tweets about COVID-19 and provide early evidence of the use of bots to promote political conspiracies in the United States, in stark contrast with humans who focus on public health concerns.

preprint2015arXiv

Adaptive Search over Sorted Sets

We revisit the classical algorithms for searching over sorted sets to introduce an algorithm refinement, called Adaptive Search, that combines the good features of Interpolation search and those of Binary search. W.r.t. Interpolation search, only a constant number of extra comparisons is introduced. Yet, under diverse input data distributions our algorithm shows costs comparable to that of Interpolation search, i.e., O(log log n) while the worst-case cost is always in O(log n), as with Binary search. On benchmarks drawn from large datasets, both synthetic and real-life, Adaptive search scores better times and lesser memory accesses even than Santoro and Sidney's Interpolation-Binary search.

preprint2015arXiv

Defining and identifying Sleeping Beauties in science

A Sleeping Beauty (SB) in science refers to a paper whose importance is not recognized for several years after publication. Its citation history exhibits a long hibernation period followed by a sudden spike of popularity. Previous studies suggest a relative scarcity of SBs. The reliability of this conclusion is, however, heavily dependent on identification methods based on arbitrary threshold parameters for sleeping time and number of citations, applied to small or monodisciplinary bibliographic datasets. Here we present a systematic, large-scale, and multidisciplinary analysis of the SB phenomenon in science. We introduce a parameter-free measure that quantifies the extent to which a specific paper can be considered an SB. We apply our method to 22 million scientific papers published in all disciplines of natural and social sciences over a time span longer than a century. Our results reveal that the SB phenomenon is not exceptional. There is a continuous spectrum of delayed recognition where both the hibernation period and the awakening intensity are taken into account. Although many cases of SBs can be identified by looking at monodisciplinary bibliographic data, the SB phenomenon becomes much more apparent with the analysis of multidisciplinary datasets, where we can observe many examples of papers achieving delayed yet exceptional importance in disciplines different from those where they were originally published. Our analysis emphasizes a complex feature of citation dynamics that so far has received little attention, and also provides empirical evidence against the use of short-term citation metrics in the quantification of scientific impact.

preprint2014arXiv

Analysis of a heterogeneous social network of humans and cultural objects

Modern online social platforms enable their members to be involved in a broad range of activities like getting friends, joining groups, posting/commenting resources and so on. In this paper we investigate whether a correlation emerges across the different activities a user can take part in. To perform our analysis we focused on aNobii, a social platform with a world-wide user base of book readers, who like to post their readings, give ratings, review books and discuss them with friends and fellow readers. aNobii presents a heterogeneous structure: i) part social network, with user-to-user interactions, ii) part interest network, with the management of book collections, and iii) part folksonomy, with books that are tagged by the users. We analyzed a complete and anonymized snapshot of aNobii and we focused on three specific activities a user can perform, namely her tagging behavior, her tendency to join groups and her aptitude to compile a wishlist reporting the books she is planning to read. In this way each user is associated with a tag-based, a group-based and a wishlist-based profile. Experimental analysis carried out by means of Information Theory tools like entropy and mutual information suggests that tag-based and group-based profiles are in general more informative than wishlist-based ones. Furthermore, we discover that the degree of correlation between the three profiles associated with the same user tend to be small. Hence, user profiling cannot be reduced to considering just any one type of user activity (although important) but it is crucial to incorporate multiple dimensions to effectively describe users preferences and behavior.

preprint2014arXiv

Evolution of Online User Behavior During a Social Upheaval

Social media represent powerful tools of mass communication and information diffusion. They played a pivotal role during recent social uprisings and political mobilizations across the world. Here we present a study of the Gezi Park movement in Turkey through the lens of Twitter. We analyze over 2.3 million tweets produced during the 25 days of protest occurred between May and June 2013. We first characterize the spatio-temporal nature of the conversation about the Gezi Park demonstrations, showing that similarity in trends of discussion mirrors geographic cues. We then describe the characteristics of the users involved in this conversation and what roles they played. We study how roles and individual influence evolved during the period of the upheaval. This analysis reveals that the conversation becomes more democratic as events unfold, with a redistribution of influence over time in the user population. We conclude by observing how the online and offline worlds are tightly intertwined, showing that exogenous events, such as political speeches or police actions, affect social media conversations and trigger changes in individual behavior.

preprint2014arXiv

On Facebook, most ties are weak

Pervasive socio-technical networks bring new conceptual and technological challenges to developers and users alike. A central research theme is evaluation of the intensity of relations linking users and how they facilitate communication and the spread of information. These aspects of human relationships have been studied extensively in the social sciences under the framework of the "strength of weak ties" theory proposed by Mark Granovetter.13 Some research has considered whether that theory can be extended to online social networks like Facebook, suggesting interaction data can be used to predict the strength of ties. The approaches being used require handling user-generated data that is often not publicly available due to privacy concerns. Here, we propose an alternative definition of weak and strong ties that requires knowledge of only the topology of the social network (such as who is a friend of whom on Facebook), relying on the fact that online social networks, or OSNs, tend to fragment into communities. We thus suggest classifying as weak ties those edges linking individuals belonging to different communities and strong ties as those connecting users in the same community. We tested this definition on a large network representing part of the Facebook social graph and studied how weak and strong ties affect the information-diffusion process. Our findings suggest individuals in OSNs self-organize to create well-connected communities, while weak ties yield cohesion and optimize the coverage of information spread.

preprint2014arXiv

Optimal network modularity for information diffusion

We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and numerical simulations, we demonstrate the existence of an optimal network modularity, where global diffusion require the minimal number of early adopters.

preprint2014arXiv

Quality versus quantity in scientific impact

Citation metrics are becoming pervasive in the quantitative evaluation of scholars, journals and institutions. More then ever before, hiring, promotion, and funding decisions rely on a variety of impact metrics that cannot disentangle quality from quantity of scientific output, and are biased by factors such as discipline and academic age. Biases affecting the evaluation of single papers are compounded when one aggregates citation-based metrics across an entire publication record. It is not trivial to compare the quality of two scholars that during their careers have published at different rates in different disciplines in different periods of time. We propose a novel solution based on the generation of a statistical baseline specifically tailored on the academic profile of each researcher. Our method can decouple the roles of quantity and quality of publications to explain how a certain level of impact is achieved. The method is flexible enough to allow for the evaluation of, and fair comparison among, arbitrary collections of papers --- scholar publication records, journals, and entire institutions; and can be extended to simultaneously suppresses any source of bias. We show that our method can capture the quality of the work of Nobel laureates irrespective of number of publications, academic age, and discipline, even when traditional metrics indicate low impact in absolute terms. We further apply our methodology to almost a million scholars and over six thousand journals to measure the impact that cannot be explained by the volume of publications alone.

preprint2014arXiv

Visualizing criminal networks reconstructed from mobile phone records

In the fight against the racketeering and terrorism, knowledge about the structure and the organization of criminal networks is of fundamental importance for both the investigations and the development of efficient strategies to prevent and restrain crimes. Intelligence agencies exploit information obtained from the analysis of large amounts of heterogeneous data deriving from various informative sources including the records of phone traffic, the social networks, surveillance data, interview data, experiential police data, and police intelligence files, to acquire knowledge about criminal networks and initiate accurate and destabilizing actions. In this context, visual representation techniques coordinate the exploration of the structure of the network together with the metrics of social network analysis. Nevertheless, the utility of visualization tools may become limited when the dimension and the complexity of the system under analysis grow beyond certain terms. In this paper we show how we employ some interactive visualization techniques to represent criminal and terrorist networks reconstructed from phone traffic data, namely foci, fisheye and geo-mapping network layouts. These methods allow the exploration of the network through animated transitions among visualization models and local enlargement techniques in order to improve the comprehension of interesting areas. By combining the features of the various visualization models it is possible to gain substantial enhancements with respect to classic visualization models, often unreadable in those cases of great complexity of the network.

preprint2014arXiv

XML Matchers: approaches and challenges

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.

preprint2013arXiv

A Large-Scale Community Structure Analysis In Facebook

Understanding social dynamics that govern human phenomena, such as communications and social relationships is a major problem in current computational social sciences. In particular, given the unprecedented success of online social networks (OSNs), in this paper we are concerned with the analysis of aggregation patterns and social dynamics occurring among users of the largest OSN as the date: Facebook. In detail, we discuss the mesoscopic features of the community structure of this network, considering the perspective of the communities, which has not yet been studied on such a large scale. To this purpose, we acquired a sample of this network containing millions of users and their social relationships; then, we unveiled the communities representing the aggregation units among which users gather and interact; finally, we analyzed the statistical features of such a network of communities, discovering and characterizing some specific organization patterns followed by individuals interacting in online social networks, that emerge considering different sampling techniques and clustering methodologies. This study provides some clues of the tendency of individuals to establish social interactions in online social networks that eventually contribute to building a well-connected social structure, and opens space for further social studies.

preprint2013arXiv

A Novel Measure of Edge Centrality in Social Networks

The problem of assigning centrality values to nodes and edges in graphs has been widely investigated during last years. Recently, a novel measure of node centrality has been proposed, called k-path centrality index, which is based on the propagation of messages inside a network along paths consisting of at most k edges. On the other hand, the importance of computing the centrality of edges has been put into evidence since 1970's by Anthonisse and, subsequently by Girvan and Newman. In this work we propose the generalization of the concept of k-path centrality by defining the k-path edge centrality, a measure of centrality introduced to compute the importance of edges. We provide an efficient algorithm, running in O(k m), being m the number of edges in the graph. Thus, our technique is feasible for large scale network analysis. Finally, the performance of our algorithm is analyzed, discussing the results obtained against large online social network datasets.

preprint2013arXiv

Analyzing User Behavior across Social Sharing Environments

In this work we present an in-depth analysis of the user behaviors on different Social Sharing systems. We consider three popular platforms, Flickr, Delicious and StumbleUpon, and, by combining techniques from social network analysis with techniques from semantic analysis, we characterize the tagging behavior as well as the tendency to create friendship relationships of the users of these platforms. The aim of our investigation is to see if (and how) the features and goals of a given Social Sharing system reflect on the behavior of its users and, moreover, if there exists a correlation between the social and tagging behavior of the users. We report our findings in terms of the characteristics of user profiles according to three different dimensions: (i) intensity of user activities, (ii) tag-based characteristics of user profiles, and (iii) semantic characteristics of user profiles.

preprint2013arXiv

Enhancing community detection using a network weighting strategy

A community within a network is a group of vertices densely connected to each other but less connected to the vertices outside. The problem of detecting communities in large networks plays a key role in a wide range of research areas, e.g. Computer Science, Biology and Sociology. Most of the existing algorithms to find communities count on the topological features of the network and often do not scale well on large, real-life instances. In this article we propose a strategy to enhance existing community detection algorithms by adding a pre-processing step in which edges are weighted according to their centrality w.r.t. the network topology. In our approach, the centrality of an edge reflects its contribute to making arbitrary graph tranversals, i.e., spreading messages over the network, as short as possible. Our strategy is able to effectively complements information about network topology and it can be used as an additional tool to enhance community detection. The computation of edge centralities is carried out by performing multiple random walks of bounded length on the network. Our method makes the computation of edge centralities feasible also on large-scale networks. It has been tested in conjunction with three state-of-the-art community detection algorithms, namely the Louvain method, COPRA and OSLOM. Experimental results show that our method raises the accuracy of existing algorithms both on synthetic and real-life datasets.

preprint2013arXiv

Forensic Analysis of Phone Call Networks

In the context of preventing and fighting crime, the analysis of mobile phone traffic, among actors of a criminal network, is helpful in order to reconstruct illegal activities on the base of the relationships connecting those specific individuals. Thus, forensic analysts and investigators require new advanced tools and techniques which allow them to manage these data in a meaningful and efficient way. In this paper we present LogAnalysis, a tool we developed to provide visual data representation and filtering, statistical analysis features and the possibility of a temporal analysis of mobile phone activities. Its adoption may help in unveiling the structure of a criminal network and the roles and dynamics of communications among its components. By using LogAnalysis, forensic investigators could deeply understand hierarchies within criminal organizations, for example discovering central members that provide connections among different sub-groups, etc. Moreover, by analyzing the temporal evolution of the contacts among individuals, or by focusing on specific time windows they could acquire additional insights on the data they are analyzing. Finally, we put into evidence how the adoption of LogAnalysis may be crucial to solve real cases, providing as example a number of case studies inspired by real forensic investigations led by one of the authors.

preprint2013arXiv

Mixing local and global information for community detection in large networks

The problem of clustering large complex networks plays a key role in several scientific fields ranging from Biology to Sociology and Computer Science. Many approaches to clustering complex networks are based on the idea of maximizing a network modularity function. Some of these approaches can be classified as global because they exploit knowledge about the whole network topology to find clusters. Other approaches, instead, can be interpreted as local because they require only a partial knowledge of the network topology, e.g., the neighbors of a vertex. Global approaches are able to achieve high values of modularity but they do not scale well on large networks and, therefore, they cannot be applied to analyze on-line social networks like Facebook or YouTube. In contrast, local approaches are fast and scale up to large, real-life networks, at the cost of poorer results than those achieved by local methods. In this article we propose a glocal method to maximizing modularity, i.e., our method uses information at the global level, yet its scalability on large networks is comparable to that of local methods. The proposed method is called COmplex Network CLUster DEtection (or, shortly, CONCLUDE.) It works in two stages: in the first stage it uses an information-propagation model, based on random and non-backtracking walks of finite length, to compute the importance of each edge in keeping the network connected (called edge centrality.) Then, edge centrality is used to map network vertices onto points of an Euclidean space and to compute distances between all pairs of connected vertices. In the second stage, CONCLUDE uses the distances computed in the first stage to partition the network into clusters. CONCLUDE is computationally efficient since in the average case its cost is roughly linear in the number of edges of the network.

preprint2013arXiv

Scientific impact evaluation and the effect of self-citations: mitigating the bias by discounting h-index

In this paper, we propose a measure to assess scientific impact that discounts self-citations and does not require any prior knowledge on the their distribution among publications. This index can be applied to both researchers and journals. In particular, we show that it fills the gap of h-index and similar measures that do not take into account the effect of self-citations for authors or journals impact evaluation. The paper provides with two real-world examples: in the former, we evaluate the research impact of the most productive scholars in Computer Science (according to DBLP); in the latter, we revisit the impact of the journals ranked in the 'Computer Science Applications' section of SCImago. We observe how self-citations, in many cases, affect the rankings obtained according to different measures (including h-index and ch-index), and show how the proposed measure mitigates this effect.

preprint2013arXiv

The Digital Evolution of Occupy Wall Street

We examine the temporal evolution of digital communication activity relating to the American anti-capitalist movement Occupy Wall Street. Using a high-volume sample from the microblogging site Twitter, we investigate changes in Occupy participant engagement, interests, and social connectivity over a fifteen month period starting three months prior to the movement's first protest action. The results of this analysis indicate that, on Twitter, the Occupy movement tended to elicit participation from a set of highly interconnected users with pre-existing interests in domestic politics and foreign social movements. These users, while highly vocal in the months immediately following the birth of the movement, appear to have lost interest in Occupy related communication over the remainder of the study period.

preprint2013arXiv

The Geospatial Characteristics of a Social Movement Communication Network

Social movements rely in large measure on networked communication technologies to organize and disseminate information relating to the movements' objectives. In this work we seek to understand how the goals and needs of a protest movement are reflected in the geographic patterns of its communication network, and how these patterns differ from those of stable political communication. To this end, we examine an online communication network reconstructed from over 600,000 tweets from a thirty-six week period covering the birth and maturation of the American anticapitalist movement, Occupy Wall Street. We find that, compared to a network of stable domestic political communication, the Occupy Wall Street network exhibits higher levels of locality and a hub and spoke structure, in which the majority of non-local attention is allocated to high-profile locations such as New York, California, and Washington D.C. Moreover, we observe that information flows across state boundaries are more likely to contain framing language and references to the media, while communication among individuals in the same state is more likely to reference protest action and specific places and and times. Tying these results to social movement theory, we propose that these features reflect the movement's efforts to mobilize resources at the local level and to develop narrative frames that reinforce collective purpose at the national level.

preprint2012arXiv

Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure

Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. However, as tags are informally defined, continually changing, and ungoverned, it has often been criticised for lowering, rather than increasing, the efficiency of searching. To address this issue, a variety of approaches have been proposed that recommend users what tags to use, both when labeling and when looking for resources. These techniques work well in dense folksonomies, but they fail to do so when tag usage exhibits a power law distribution, as it often happens in real-life folksonomies. To tackle this issue, we propose an approach that induces the creation of a dense folksonomy, in a fully automatic and transparent way: when users label resources, an innovative tag similarity metric is deployed, so to enrich the chosen tag set with related tags already present in the folksonomy. The proposed metric, which represents the core of our approach, is based on the mutual reinforcement principle. Our experimental evaluation proves that the accuracy and coverage of searches guaranteed by our metric are higher than those achieved by applying classical metrics.

preprint2012arXiv

Generalized Louvain Method for Community Detection in Large Networks

In this paper we present a novel strategy to discover the community structure of (possibly, large) networks. This approach is based on the well-know concept of network modularity optimization. To do so, our algorithm exploits a novel measure of edge centrality, based on the k-paths. This technique allows to efficiently compute a edge ranking in large networks in near linear time. Once the centrality ranking is calculated, the algorithm computes the pairwise proximity between nodes of the network. Finally, it discovers the community structure adopting a strategy inspired by the well-known state-of-the-art Louvain method (henceforth, LM), efficiently maximizing the network modularity. The experiments we carried out show that our algorithm outperforms other techniques and slightly improves results of the original LM, providing reliable results. Another advantage is that its adoption is naturally extended even to unweighted networks, differently with respect to the LM.

preprint2012arXiv

Measuring Similarity in Large-scale Folksonomies

Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. Unlike taxonomies, which overimpose a hierarchical categorisation of content, folksonomies enable end-users to freely create and choose the categories (in this case, tags) that best describe some content. However, as tags are informally defined, continually changing, and ungoverned, social tagging has often been criticised for lowering, rather than increasing, the efficiency of searching, due to the number of synonyms, homonyms, polysemy, as well as the heterogeneity of users and the noise they introduce. To address this issue, a variety of approaches have been proposed that recommend users what tags to use, both when labelling and when looking for resources. As we illustrate in this paper, real world folksonomies are characterized by power law distributions of tags, over which commonly used similarity metrics, including the Jaccard coefficient and the cosine similarity, fail to compute. We thus propose a novel metric, specifically developed to capture similarity in large-scale folksonomies, that is based on a mutual reinforcement principle: that is, two tags are deemed similar if they have been associated to similar resources, and vice-versa two resources are deemed similar if they have been labelled by similar tags. We offer an efficient realisation of this similarity metric, and assess its quality experimentally, by comparing it against cosine similarity, on three large-scale datasets, namely Bibsonomy, MovieLens and CiteULike.

preprint2012arXiv

Topological Features of Online Social Networks

The importance of modeling and analyzing Social Networks is a consequence of the success of Online Social Networks during last years. Several models of networks have been proposed, reflecting the different characteristics of Social Networks. Some of them fit better to model specific phenomena, such as the growth and the evolution of the Social Networks; others are more appropriate to capture the topological characteristics of the networks. Because these networks show unique and different properties and features, in this work we describe and exploit several models in order to capture the structure of popular Online Social Networks, such as Arxiv, Facebook, Wikipedia and YouTube. Our experimentation aims at verifying the structural characteristics of these networks, in order to understand what model better depicts their structure, and to analyze the inner community structure, to illustrate how members of these Online Social Networks interact and group together into smaller communities.

preprint2011arXiv

A Framework for Designing 3D Virtual Environments

The process of design and development of virtual environments can be supported by tools and frameworks, to save time in technical aspects and focusing on the content. In this paper we present an academic framework which provides several levels of abstraction to ease this work. It includes state-of-the-art components we devised or integrated adopting open-source solutions in order to face specific problems. Its architecture is modular and customizable, the code is open-source.

preprint2011arXiv

Analyzing the Facebook Friendship Graph

Online Social Networks (OSN) during last years acquired a huge and increasing popularity as one of the most important emerging Web phenomena, deeply modifying the behavior of users and contributing to build a solid substrate of connections and relationships among people using the Web. In this preliminary work paper, our purpose is to analyze Facebook, considering a significant sample of data reflecting relationships among subscribed users. Our goal is to extract, from this platform, relevant information about the distribution of these relations and exploit tools and algorithms provided by the Social Network Analysis (SNA) to discover and, possibly, understand underlying similarities between the developing of OSN and real-life social networks.

preprint2011arXiv

Automatic Wrapper Adaptation by Tree Edit Distance Matching

Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robustness of wrappers, in order not to compromise assets of information or reliability of data extracted. Unfortunately, wrappers may fail in the task of extracting data from a Web page, if its structure changes, sometimes even slightly, thus requiring the exploiting of new techniques to be automatically held so as to adapt the wrapper to the new structure of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through improved tree edit distance matching techniques.

preprint2011arXiv

Crawling Facebook for Social Network Analysis Purposes

We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.

preprint2011arXiv

Design of Automatically Adaptable Web Wrappers

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation.

preprint2011arXiv

Improving Recommendation Quality by Merging Collaborative Filtering and Social Relationships

Matrix Factorization techniques have been successfully applied to raise the quality of suggestions generated by Collaborative Filtering Systems (CFSs). Traditional CFSs based on Matrix Factorization operate on the ratings provided by users and have been recently extended to incorporate demographic aspects such as age and gender. In this paper we propose to merge CFS based on Matrix Factorization and information regarding social friendships in order to provide users with more accurate suggestions and rankings on items of their interest. The proposed approach has been evaluated on a real-life online social network; the experimental results show an improvement against existing CFSs. A detailed comparison with related literature is also present.

preprint2011arXiv

Intelligent Self-Repairable Web Wrappers

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.

preprint2011arXiv

Rendering of 3D Dynamic Virtual Environments

In this paper we present a framework for the rendering of dynamic 3D virtual environments which can be integrated in the development of videogames. It includes methods to manage sounds and particle effects, paged static geometries, the support of a physics engine and various input systems. It has been designed with a modular structure to allow future expansions. We exploited some open-source state-of-the-art components such as OGRE, PhysX, ParticleUniverse, etc.; all of them have been properly integrated to obtain peculiar physical and environmental effects. The stand-alone version of the application is fully compatible with Direct3D and OpenGL APIs and adopts OpenAL APIs to manage audio cards. Concluding, we devised a showcase demo which reproduces a dynamic 3D environment, including some particular effects: the alternation of day and night infuencing the lighting of the scene, the rendering of terrain, water and vegetation, the reproduction of sounds and atmospheric agents.

preprint2010arXiv

Living City, a Collaborative Browser-based Massively Multiplayer Online Game

This work presents the design and implementation of our Browser-based Massively Multiplayer Online Game, Living City, a simulation game fully developed at the University of Messina. Living City is a persistent and real-time digital world, running in the Web browser environment and accessible from users without any client-side installation. Today Massively Multiplayer Online Games attract the attention of Computer Scientists both for their architectural peculiarity and the close interconnection with the social network phenomenon. We will cover these two aspects paying particular attention to some aspects of the project: game balancing (e.g. algorithms behind time and money balancing); business logic (e.g., handling concurrency, cheating avoidance and availability) and, finally, social and psychological aspects involved in the collaboration of players, analyzing their activities and interconnections.

Emilio Ferrara

What is connected

Connect this record

See the researcher in context

Building this map preview

53 published item(s)

Gendered Pathways in AI Companionship: Cross-Community Behavior and Toxicity Patterns on Reddit

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

The Generative AI Paradox: GenAI and the Erosion of Trust, the Corrosion of Information Verification, and the Demise of Truth

Social Bots: Detection and Challenges

Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data

Botometer 101: Social bot practicum for computational social scientists

Construction of Large-Scale Misinformation Labeled Datasets from Social Media Discourse using Label Refinement

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment

Individual and Collective Performance Deteriorate in a New Team: A Case Study of CS:GO Tournaments

Individualized Context-Aware Tensor Factorization for Online Games Predictions

Social Bots and Social Media Manipulation in 2020: The Year in Review

Tracking e-cigarette warning label compliance on Instagram with deep learning

Charting the Landscape of Online Cryptocurrency Manipulation

Detecting multi-timescale consumption patterns from receipt data: A non-negative tensor factorization approach

Detecting Troll Behavior via Inverse Reinforcement Learning: A Case Study of Russian Trolls in the 2016 US Election

Learning Behavioral Representations from Wearable Sensors

Learning to Reason in Round-based Games: Multi-task Sequence Generation for Purchasing Decision Making in First-person Shooters

Leveraging Clickstream Trajectories to Reveal Low-Quality Workers in Crowdsourced Forecasting Platforms

Predictability limit of partially observed systems

ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set

What Types of COVID-19 Conspiracies are Populated by Twitter Bots?

Adaptive Search over Sorted Sets

Defining and identifying Sleeping Beauties in science

Analysis of a heterogeneous social network of humans and cultural objects

Evolution of Online User Behavior During a Social Upheaval

On Facebook, most ties are weak

Optimal network modularity for information diffusion

Quality versus quantity in scientific impact

Visualizing criminal networks reconstructed from mobile phone records

XML Matchers: approaches and challenges

A Large-Scale Community Structure Analysis In Facebook

A Novel Measure of Edge Centrality in Social Networks

Analyzing User Behavior across Social Sharing Environments

Enhancing community detection using a network weighting strategy

Forensic Analysis of Phone Call Networks

Mixing local and global information for community detection in large networks

Scientific impact evaluation and the effect of self-citations: mitigating the bias by discounting h-index

The Digital Evolution of Occupy Wall Street

The Geospatial Characteristics of a Social Movement Communication Network

Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure

Generalized Louvain Method for Community Detection in Large Networks

Measuring Similarity in Large-scale Folksonomies

Topological Features of Online Social Networks

A Framework for Designing 3D Virtual Environments

Analyzing the Facebook Friendship Graph

Automatic Wrapper Adaptation by Tree Edit Distance Matching

Crawling Facebook for Social Network Analysis Purposes

Design of Automatically Adaptable Web Wrappers

Improving Recommendation Quality by Merging Collaborative Filtering and Social Relationships

Intelligent Self-Repairable Web Wrappers

Rendering of 3D Dynamic Virtual Environments

Living City, a Collaborative Browser-based Massively Multiplayer Online Game