Researcher profile

Roberto Di Pietro

Roberto Di Pietro contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
23works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

23 published item(s)

preprint2026arXiv

Cross-Cultural Transfer of Emoji Semantics and Sentiment in Financial Social Media

Emojis are widely used in online financial communication, but it is unclear whether they provide transferable sentiment signals across languages, platforms, and asset communities. This study examines the extent to which emoji usage, semantics, and sentiment polarity remain stable across financial communities, and how these layers influence zero-shot sentiment transfer. Using large corpora of Twitter and StockTwits posts in four languages, we measure cross-community divergence and evaluate sentiment models trained under emoji-only, text-only, and text+emoji inputs. We find that emoji frequencies differ across communities, especially across languages, but their semantics and sentiment polarity are largely stable. Cross-asset transferability shows minimal degradation, while cross-language transfer remains the most challenging. Including emojis consistently reduces transfer gaps relative to text-only models. These results indicate that financial communication exhibits a partially shared ``emoji code,'' and that emojis provide compact, language-independent sentiment cues that improve model generalization across markets and platforms.

preprint2026arXiv

FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media

This paper explores the use of emojis in financial sentiment analysis, focusing on the social media platform StockTwits. Emojis, increasingly prevalent in digital communication, have potential as compact indicators of investor sentiment, which can be critical for predicting market trends. Our study examines whether emojis alone can serve as reliable proxies for financial sentiment and how they compare with traditional text-based analysis. We conduct a series of experiments using logistic regression and transformer models. We further analyze the performance, computational efficiency, and data requirements of emoji-based versus text-based sentiment classification. Using a balanced dataset of about 528,000 emoji-containing StockTwits posts, we find that emoji-only models achieve F1 approximately 0.75, lower than text-emoji combined models, which achieve F1 approximately 0.88, but with far lower computational cost. This is a useful feature in time-sensitive settings such as high-frequency trading. Furthermore, certain emojis and emoji pairs exhibit strong predictive power for market sentiment, demonstrating over 90 percent accuracy in predicting bullish or bearish trends. Finally, our research reveals large statistical differences in emoji usage between financial and general social media contexts, stressing the need for domain-specific sentiment analysis models.

preprint2026arXiv

PumpSense: Real-Time Detection and Target Extraction of Crypto Pump-and-Dumps on Telegram

Cryptocurrency pump-and-dump schemes coordinated via Telegram threaten market integrity. However, existing research addressing this specific threat has not yet produced solutions that combine reliable results with fast response. This is in part due to the absence of publicly available, message-level labeled data, as well as design choices. In this paper, we address both issues. In particular, we introduce a corpus of over 280,000 Telegram posts from 39 pump-organizing groups, all manually reviewed to identify 2,246 pump announcements and their targeted cryptocurrency and exchange. Leveraging this dataset, we define two tasks: real-time pump-announcement detection and target cryptocurrency/exchange extraction. For detection, we compare two machine-learning models: a lightweight tree-based LightGBM classifier (F1=0.79, latency=9.4 s/sample) and a transformer-based BGE-M3 (F1=0.83, latency=50 ms/sample). With our proposed approach, we show that message analysis can achieve near-instant pump detection at the level of individual Telegram message windows. Unlike prior work that relies purely on market data and typically detects pumps tens of seconds after abnormal trading activity is observed, our method operates directly on the coordination messages themselves and can be evaluated in microseconds per window on commodity hardware. To our knowledge, we also establish the first benchmark for manipulated coin and exchange extraction. We demonstrate that traditional rule-based extraction methods, widely relied upon in prior literature, are ineffective due to ticker ambiguity. In contrast, LLMs achieve the highest accuracy with a score of 0.91.

preprint2022arXiv

Analysis of Polkadot: Architecture, Internals, and Contradictions

Polkadot is a network protocol launched in 2020 with the ambition of unlocking the full potential of blockchain technologies. Its novel multi-chain protocol allows arbitrary data to be transferred across heterogeneous blockchains, enabling the implementation of a wide range of novel use cases. The Polkadot architecture is based on the principles of sharding, which promises to solve scalability and interoperability shortcomings that encumber many existing blockchain-based systems. Lured by these impressive features, investors immediately appreciated the Polkadot project, which is now firmly ranked among the top 10 cryptocurrencies by capitalization (around 20 Billions USD). However, Polkadot has not received the same level of attention from academia that other proposals in the crypto domain have received so far, like Bitcoin, Ethereum, and Algorand, to cite a few. Polkadot architecture is described and discussed only in the grey literature, and very little is known about its internals. In this paper, we provide the first systematic study on the Polkadot environment, detailing its protocols, governance, and economic model. Then, we identify several limitations -- supported by an empirical analysis of its ledger -- that could severely affect the scalability and overall security of the network. Finally, based on our analysis, we provide future directions to inspire researchers to investigate further the Polkadot ecosystem and its pitfalls in terms of performance, security, and network aspects.

preprint2022arXiv

Content Privacy Enforcement Models in Decentralized Online Social Networks: State of Play, Solutions, Limitations, and Future Directions

In recent years, Decentralized Online Social Networks (DOSNs) have been attracting the attention of many users because they reduce the risk of censorship, surveillance, and information leakage from the service provider. In contrast to the most popular Online Social Networks, which are based on centralized architectures (e.g., Facebook, Twitter, or Instagram), DOSNs are not based on a single service provider acting as a central authority. Indeed, the contents that are published on DOSNs are stored on the devices made available by their users, which cooperate to execute the tasks needed to provide the service. To continuously guarantee their availability, the contents published by a user could be stored on the devices of other users, simply because they are online when required. Consequently, such contents must be properly protected by the DOSN infrastructure, in order to ensure that they can be really accessed only by users who have the permission of the publishers. As a consequence, DOSNs require efficient solutions for protecting the privacy of the contents published by each user with respect to the other users of the social network. In this paper, we investigate and compare the principal content privacy enforcement models adopted by current DOSNs evaluating their suitability to support different types of privacy policies based on user groups. Such evaluation is carried out by implementing several models and comparing their performance for the typical operations performed on groups, i.e., content publish, user join and leave. Further, we also highlight the limitations of current approaches and show future research directions. This contribution, other than being interesting on its own, provides a blueprint for researchers and practitioners interested in implementing DOSNs, and also highlights a few open research directions.

preprint2022arXiv

Metaverse: Security and Privacy Issues

The metaverse promises a host of bright opportunities for business, economics, and society. Though, a number of critical aspects are still to be considered and the analysis of their impact is almost non-existent. In this paper, we provide several contributions. We start by analysing the foundations of the metaverse, later we focus on the novel privacy and security issues introduced by this new paradigm, and finally we broaden the scope of the contribution highlighting some of the far-reaching yet logical implications of the metaverse on a number of domains, not all of them in tech. Throughout the paper, we also discuss possible research directions. We believe that the provided holistic view on the foundations, technology, and issues related to the metaverse-with a focus on security and privacy-, other than being an interesting contribution on its own, could also pave the way for a few multidisciplinary research avenues.

preprint2022arXiv

Satellite-Based Communications Security: A Survey of Threats, Solutions, and Research Challenges

Satellite-based Communication systems are gaining renewed momentum in Industry and Academia, thanks to innovative services introduced by leading tech companies and the promising impact they can deliver towards the global connectivity objective tackled by early 6G initiatives. On the one hand, the emergence of new manufacturing processes and radio technologies promises to reduce service costs while guaranteeing outstanding communication latency, available bandwidth, flexibility, and coverage range. On the other hand, cybersecurity techniques and solutions applied in SATCOM links should be updated to reflect the substantial advancements in attacker capabilities characterizing the last two decades. However, business urgency and opportunities are leading operators towards challenging system trade-offs, resulting in an increased attack surface and a general relaxation of the available security services. In this paper, we tackle the cited problems and present a comprehensive survey on the link-layer security threats, solutions, and challenges faced when deploying and operating SATCOM systems.Specifically, we classify the literature on security for SATCOM systems into two main branches, i.e., physical-layer security and cryptography schemes.Then, we further identify specific research domains for each of the identified branches, focusing on dedicated security issues, including, e.g., physical-layer confidentiality, anti-jamming schemes, anti-spoofing strategies, and quantum-based key distribution schemes. For each of the above domains, we highlight the most essential techniques, peculiarities, advantages, disadvantages, lessons learned, and future directions.Finally, we also identify emerging research topics whose additional investigation by Academia and Industry could further attract researchers and investors, ultimately unleashing the full potential behind ubiquitous satellite communications.

preprint2022arXiv

Serverless Computing: A Security Perspective

Serverless Computing is a virtualisation-related paradigm that promises to simplify application management and to solve the last challenges in the field: scale down and easy to use. The implied cost reduction, coupled with a simplified management of underlying applications, are expected to further push the adoption of virtualisation-based solutions, including cloud-computing or telco-cloud solutions. However, in this quest for efficiency, security is not ranked among the top priorities, also because of the (misleading) belief that current solutions developed for virtualised environments could be applied (as is) to this new paradigm. Unfortunately, this is not the case, due to the highlighted idiosyncratic features of serverless computing. In this paper, we review the current serverless architectures, abstract and categorise their founding principles, and provide an in depth analyse of them from the point of view of security, referring to principles and practices of the cybersecurity domain. In particular, we show the security shortcomings of the analysed serverless architectural paradigms, point to possible countermeasures, and highlight a few research directions.

preprint2021arXiv

SpreadMeNot: A Provably Secure and Privacy-Preserving Contact Tracing Protocol

A plethora of contact tracing apps have been developed and deployed in several countries around the world in the battle against Covid-19. However, people are rightfully concerned about the security and privacy risks of such applications. To this end, the contribution of this work is twofold. First, we present an in-depth analysis of the security and privacy characteristics of the most prominent contact tracing protocols, under both passive and active adversaries. The results of our study indicate that all protocols are vulnerable to a variety of attacks, mainly due to the deterministic nature of the underlying cryptographic protocols. Our second contribution is the design and implementation of SpreadMeNot, a novel contact tracing protocol that can defend against most passive and active attacks, thus providing strong (provable) security and privacy guarantees that are necessary for such a sensitive application. Our detailed analysis, both formal and experimental, shows that SpreadMeNot satisfies security, privacy, and performance requirements, hence being an ideal candidate for building a contact tracing solution that can be adopted by the majority of the general public, as well as to serve as an open-source reference for further developments in the field.

preprint2020arXiv

A Longitudinal Study on Web-sites Password Management (in)Security: Evidence and Remedies

Single-factor password-based authentication is generally the norm to access on-line Web-sites. While single-factor authentication is well known to be a weak form of authentication, a further concern arises when considering the possibility for an attacker to recover the user passwords by leveraging the loopholes in the password recovery mechanisms. Indeed, the adoption by a Web-site of a poor password management system makes useless even the most robust password chosen by the registered users. In this paper, building on the results of our previous work, we study the possible attacks to on-line password recovery systems analyzing the mechanisms implemented by some of the most popular Web-sites. In detail, we provide several contributions: (i) we revise and detail the attacker model; (ii) we provide an updated analysis with respect to a preliminary study we carried out in December 2017; (iii) we perform a brand new analysis of the current top 200 Alexa's Web-sites of five major EU countries; and, (iv) we propose \sol, a working open-source module that could be adopted by any Web-site to provide registered users with a password recovery mechanism to prevent mail service provider-level attacks. Overall, it is striking to notice how the analyzed Web-sites have made little (if any) effort to become compliant with the GDPR regulation, showing that the objective to have basic user protection mechanisms in place---despite the fines threatened by GDPR---is still far, mainly because of sub-standard security management practices. Finally, it is worth noting that while this study has been focused on EU registered Web-sites, the proposed solution has, instead, general applicability.

preprint2020arXiv

A Survey on Computational Propaganda Detection

Propaganda campaigns aim at influencing people's mindset with the purpose of advancing a specific agenda. They exploit the anonymity of the Internet, the micro-profiling ability of social networks, and the ease of automatically creating and managing coordinated networks of accounts, to reach millions of social network users with persuasive messages, specifically targeted to topics each individual user is sensitive to, and ultimately influencing the outcome on a targeted issue. In this survey, we review the state of the art on computational propaganda detection from the perspective of Natural Language Processing and Network Analysis, arguing about the need for combined efforts between these communities. We further discuss current challenges and future research directions.

preprint2020arXiv

BAD: Blockchain Anomaly Detection

Anomaly detection tools play a role of paramount importance in protecting networks and systems from unforeseen attacks, usually by automatically recognizing and filtering out anomalous activities. Over the years, different approaches have been designed, all focused on lowering the false positive rate. However, no proposal has addressed attacks targeting blockchain-based systems. In this paper we present BAD: the first Blockchain Anomaly Detection solution. BAD leverages blockchain meta-data, named forks, in order to collect potentially malicious activities in the network/system. BAD enjoys the following features: (i) it is distributed (thus avoiding any central point of failure), (ii) it is tamper-proof (making not possible for a malicious software to remove or to alter its own traces), (iii) it is trusted (any behavioral data is collected and verified by the majority of the network) and (iv) it is private (avoiding any third party to collect/analyze/store sensitive information). Our proposal is validated via both experimental results and theoretical complexity analysis, that highlight the quality and viability of our Blockchain Anomaly Detection solution.

preprint2020arXiv

BrokenStrokes: On the (in)Security of Wireless Keyboards

Wireless devices resorting to event-triggered communications have been proved to suffer critical privacy issues, due to the intrinsic leakage associated with radio-frequency (RF) emissions. In this paper, we move the attack frontier forward by proposing BrokenStrokes: an inexpensive, easy to implement, efficient, and effective attack able to detect the typing of a pre-defined keyword by only eavesdropping the communication channel used by the wireless keyboard. BrokenStrokes proves itself to be a particularly dreadful attack: it achieves its goal when the eavesdropping antenna is up to 15 meters from the target keyboard, regardless of the encryption scheme, the communication protocol, the presence of radio noise, and the presence of physical obstacles. While we detail the attack in three current scenarios and discuss its striking performance--its success probability exceeds 90% in normal operating conditions--, we also provide some suggestions on how to mitigate it. The data utilized in this paper have been released as open-source to allow practitioners, industries, and academia to verify our claims and use them as a basis for further developments.

preprint2020arXiv

Cryptomining Makes Noise: a Machine Learning Approach for Cryptojacking Detection

A new cybersecurity attack,where an adversary illicitly runs crypto-mining software over the devices of unaware users, is emerging in both the literature and in the wild . This attack, known as cryptojacking, has proved to be very effective given the simplicity of running a crypto-client into a target device. Several countermeasures have recently been proposed, with different features and performance, but all characterized by a host-based architecture. This kind of solutions, designed to protect the individual user, are not suitable for efficiently protecting a corporate network, especially against insiders. In this paper, we propose a network-based approach to detect and identify crypto-clients activities by solely relying on the network traffic, even when encrypted. First, we provide a detailed analysis of the real network traces generated by three major cryptocurrencies, Bitcoin, Monero, and Bytecoin, considering both the normal traffic and the one shaped by a VPN. Then, we propose Crypto-Aegis, a Machine Learning (ML) based framework built over the results of our investigation, aimed at detecting cryptocurrencies related activities, e.g., pool mining, solo mining, and active full nodes. Our solution achieves a striking 0.96 of F1-score and 0.99 of AUC for the ROC, while enjoying a few other properties, such as device and infrastructure independence. Given the extent and novelty of the addressed threat we believe that our approach, supported by its excellent results, pave the way for further research in this area.

preprint2020arXiv

Foundations, Properties, and Security Applications of Puzzles: A Survey

Cryptographic algorithms have been used not only to create robust ciphertexts but also to generate cryptograms that, contrary to the classic goal of cryptography, are meant to be broken. These cryptograms, generally called puzzles, require the use of a certain amount of resources to be solved, hence introducing a cost that is often regarded as a time delay---though it could involve other metrics as well, such as bandwidth. These powerful features have made puzzles the core of many security protocols, acquiring increasing importance in the IT security landscape. The concept of a puzzle has subsequently been extended to other types of schemes that do not use cryptographic functions, such as CAPTCHAs, which are used to discriminate humans from machines. Overall, puzzles have experienced a renewed interest with the advent of Bitcoin, which uses a CPU-intensive puzzle as proof of work. In this paper, we provide a comprehensive study of the most important puzzle construction schemes available in the literature, categorizing them according to several attributes, such as resource type, verification type, and applications. We have redefined the term puzzle by collecting and integrating the scattered notions used in different works, to cover all the existing applications. Moreover, we provide an overview of the possible applications, identifying key requirements and different design approaches. Finally, we highlight the features and limitations of each approach, providing a useful guide for the future development of new puzzle schemes.

preprint2020arXiv

GNSS Spoofing Detection via Opportunistic IRIDIUM Signals

In this paper, we study the privately-own IRIDIUM satellite constellation, to provide a location service that is independent of the GNSS. In particular, we apply our findings to propose a new GNSS spoofing detection solution, exploiting unencrypted IRIDIUM Ring Alert (IRA) messages that are broadcast by IRIDIUM satellites. We firstly reverse-engineer many parameters of the IRIDIUM satellite constellation, such as the satellites speed, packet interarrival times, maximum satellite coverage, satellite pass duration, and the satellite beam constellation, to name a few. Later, we adopt the aforementioned statistics to create a detailed model of the satellite network. Subsequently, we propose a solution to detect unintended deviations of a target user from his path, due to GNSS spoofing attacks. We show that our solution can be used efficiently and effectively to verify the position estimated from standard GNSS satellite constellation, and we provide constraints and parameters to fit several application scenarios. All the results reported in this paper, while showing the quality and viability of our proposal, are supported by real data. In particular, we have collected and analyzed hundreds of thousands of IRA messages, thanks to a measurement campaign lasting several days. All the collected data ($1000+$ hours) have been made available to the research community. Our solution is particularly suitable for unattended scenarios such as deserts, rural areas, or open seas, where standard spoofing detection techniques resorting to crowd-sourcing cannot be used due to deployment limitations. Moreover, contrary to competing solutions, our approach does not resort to physical-layer information, dedicated hardware, or multiple receiving stations, while exploiting only a single receiving antenna and publicly-available IRIDIUM transmissions. Finally, novel research directions are also highlighted.

preprint2020arXiv

Noise2Weight: On Detecting Payload Weight from Drones Acoustic Emissions

The increasing popularity of autonomous and remotely-piloted drones have paved the way for several use-cases, e.g., merchandise delivery and surveillance. In many scenarios, estimating with zero-touch the weight of the payload carried by a drone before its physical approach could be attractive, e.g., to provide an early tampering detection. In this paper, we investigate the possibility to remotely detect the weight of the payload carried by a commercial drone by analyzing its acoustic fingerprint. We characterize the difference in the thrust needed by the drone to carry different payloads, resulting in significant variations of the related acoustic fingerprint. We applied the above findings to different use-cases, characterized by different computational capabilities of the detection system. Results are striking: using the Mel-Frequency Cepstral Coefficients (MFCC) components of the audio signal and different Support Vector Machine (SVM) classifiers, we achieved a minimum classification accuracy of 98% in the detection of the specific payload class carried by the drone, using an acquisition time of 0.25 s---performances improve when using longer time acquisitions. All the data used for our analysis have been released as open-source, to enable the community to validate our findings and use such data as a ready-to-use basis for further investigations.

preprint2020arXiv

Security in Energy Harvesting Networks: A Survey of Current Solutions and Research Challenges

The recent advancements in hardware miniaturization capabilities have boosted the diffusion of systems based on Energy Harvesting (EH) technologies, as a means to power embedded wireless devices in a sustainable and low-cost fashion. Despite the undeniable management advantages, the intermittent availability of the energy source and the limited power supply has led to challenging system trade-offs, resulting in an increased attack surface and a general relaxation of the available security services. In this paper, we survey the security issues, applications, techniques, and challenges arising in wireless networks powered via EH technologies. We explore the vulnerabilities of EH networks, and we provide a comprehensive overview of the scientific literature, including attack vectors, cryptography techniques, physical-layer security schemes for data secrecy, and additional physical-layer countermeasures. For each of the identified macro-areas, we compare the scientific contributions across a range of shared features, indicating the pros and cons of the described techniques, the research challenges, and a few future directions. Finally, we also provide an overview of the emerging topics in the area, such as Non-Orthogonal Multiple Access (NOMA) and Rate-Splitting Multiple Access (RSMA) schemes, and Intelligent Reconfigurable Surfaces, that could trigger the interest of industry and academia and unleash the full potential of pervasive EH wireless networks.

preprint2020arXiv

Short-Range Audio Channels Security: Survey of Mechanisms, Applications, and Research Challenges

Short-range audio channels have a few distinguishing characteristics: ease of use, low deployment costs, and easy to tune frequencies, to cite a few. Moreover, thanks to their seamless adaptability to the security context, many techniques and tools based on audio signals have been recently proposed. However, while the most promising solutions are turning into valuable commercial products, acoustic channels are increasingly used also to launch attacks against systems and devices, leading to security concerns that could thwart their adoption. To provide a rigorous, scientific, security-oriented review of the field, in this paper we survey and classify methods, applications, and use-cases rooted on short-range audio channels for the provisioning of security services---including Two-Factor Authentication techniques, pairing solutions, device authorization strategies, defense methodologies, and attack schemes. Moreover, we also point out the strengths and weaknesses deriving from the use of short-range audio channels. Finally, we provide open research issues in the context of short-range audio channels security, calling for contributions from both academia and industry.

preprint2020arXiv

Vessels Cybersecurity: Issues, Challenges, and the Road Ahead

Vessels cybersecurity is recently gaining momentum, as a result of a few recent attacks to vessels at sea. These recent attacks have shacked the maritime domain, which was thought to be relatively immune to cyber threats. The cited belief is now over, as proved by recent mandates issued by the International Maritime Organization (IMO). According to these regulations, all vessels should be the subject of a cybersecurity risk analysis, and technical controls should be adopted to mitigate the resulting risks. This initiative is laudable since, despite the recent incidents, the vulnerabilities and threats affecting modern vessels are still unclear to operating entities, leaving the potential for dreadful consequences of further attacks just a matter of "when", not "if". In this contribution, we investigate and systematize the major security weaknesses affecting systems and communication technologies adopted in modern vessels. Specifically, we describe the architecture and main features of the different systems, pointing out their main security issues, and specifying how they were exploited by attackers to cause service disruption and relevant financial losses. We also identify a few countermeasures to the introduced attacks. Finally, we highlight a few research challenges to be addressed by industry and academia to strengthen vessels security.

preprint2017arXiv

Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics.

preprint2016arXiv

DNA-inspired online behavioral modeling and its application to spambot detection

We propose a strikingly novel, simple, and effective approach to model online user behavior: we extract and analyze digital DNA sequences from user online actions and we use Twitter as a benchmark to test our proposal. We obtain an incisive and compact DNA-inspired characterization of user actions. Then, we apply standard DNA analysis techniques to discriminate between genuine and spambot accounts on Twitter. An experimental campaign supports our proposal, showing its effectiveness and viability. To the best of our knowledge, we are the first ones to identify and adapt DNA-inspired techniques to online user behavioral modeling. While Twitter spambot detection is a specific use case on a specific social media, our proposed methodology is platform and technology agnostic, hence paving the way for diverse behavioral characterization tasks.

preprint2015arXiv

Fame for sale: efficient detection of fake Twitter followers

$\textit{Fake followers}$ are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere - hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel $\textit{Class A}$ classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier. The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers.