Source author record

Marco Mellia

Marco Mellia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Cryptography and Security cs.CY Social and Information Networks Artificial Intelligence Distributed, Parallel, and Cluster Computing Human-Computer Interaction

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Potential of Erroneous Outbound Traffic Analysis to Unveil Silent Internal Anomalies

Passive measurement has traditionally focused on inbound traffic to detect malicious activity, based on the assumption that threats originate externally. In this paper, we offer a complementary perspective by examining outbound traffic, and argue that a narrow subset -- what we term erroneous outbound traffic -- is a lighter and revealing yet overlooked data source for identifying a broad range of security threats and network problems. This traffic consists of packets sent by internal hosts that either receive no response, trigger ICMP errors, or are ICMP error messages themselves generated in response to unsolicited requests. To demonstrate its potential, we collect and analyse erroneous traffic from a large network, uncovering a variety of previously unnoticed issues, including misconfigurations, obsolete deployments and compromised hosts.

preprint2022arXiv

On the Dynamics of Political Discussions on Instagram: A Network Perspective

Instagram has been increasingly used as a source of information especially among the youth. As a result, political figures now leverage the platform to spread opinions and political agenda. We here analyze online discussions on Instagram, notably in political topics, from a network perspective. Specifically, we investigate the emergence of communities of co-commenters, that is, groups of users who often interact by commenting on the same posts and may be driving the ongoing online discussions. In particular, we are interested in salient co-interactions, i.e., interactions of co-commenters that occur more often than expected by chance and under independent behavior. Unlike casual and accidental co-interactions which normally happen in large volumes, salient co-interactions are key elements driving the online discussions and, ultimately, the information dissemination. We base our study on the analysis of 10 weeks of data centered around major elections in Brazil and Italy, following both politicians and other celebrities. We extract and characterize the communities of co-commenters in terms of topological structure, properties of the discussions carried out by community members, and how some community properties, notably community membership and topics, evolve over time. We show that communities discussing political topics tend to be more engaged in the debate by writing longer comments, using more emojis, hashtags and negative words than in other subjects. Also, communities built around political discussions tend to be more dynamic, although top commenters remain active and preserve community membership over time. Moreover, we observe a great diversity in discussed topics over time: whereas some topics attract attention only momentarily, others, centered around more fundamental political discussions, remain consistently active over time.

preprint2022arXiv

The Internet with Privacy Policies: Measuring The Web Upon Consent

To protect users' privacy, legislators have regulated the usage of tracking technologies, mandating the acquisition of users' consent before collecting data. Consequently, websites started showing more and more consent management modules -- i.e., Privacy Banners -- the visitors have to interact with to access the website content. They challenge the automatic collection of Web measurements, primarily to monitor the extensiveness of tracking technologies but also to measure Web performance in the wild. Privacy Banners in fact limit crawlers from observing the actual website content. In this paper, we present a thorough measurement campaign focusing on popular websites in Europe and the US, visiting both landing and internal pages from different countries around the world. We engineer Priv-Accept, a Web crawler able to accept the privacy policies, as most users would do in practice. This let us compare how webpages change before and after. Our results show that all measurements performed not dealing with the Privacy Banners offer a very biased and partial view of the Web. After accepting the privacy policies, we observe an increase of up to 70 trackers, which in turn slows down the webpage load time by a factor of 2x-3x.

preprint2020arXiv

A Survey on Big Data for Network Traffic Monitoring and Analysis

Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions.

preprint2020arXiv

Campus Traffic and e-Learning during COVID-19 Pandemic

The COVID-19 pandemic led to the adoption of severe measures to counteract the spread of the infection. Social distancing and lockdown measures modifies people's habits, while the Internet gains a major role to support remote working, e-teaching, online collaboration, gaming, video streaming, etc. All these sudden changes put unprecedented stress on the network. In this paper we analyze the impact of the lockdown enforcement on the Politecnico di Torino campus network. Right after the school shutdown on the 25th of February, PoliTO deployed its own in-house solution for virtual teaching. Ever since, the university provides about 600 virtual classes daily, serving more than 16,000 students per day. Here, we report a picture of how the pandemic changed PoliTO's network traffic. We first focus on the usage of remote working and collaborative platforms. Given the peculiarity of PoliTO in-house online teaching solution, we drill down on it, characterizing both the audience and the network footprint. Overall, we present a snapshot of the abrupt changes on campus traffic and learning due to COVID-19, and testify how the Internet has proved robust to successfully cope with challenges and maintain the university operations.

preprint2020arXiv

EXPLAIN-IT: Towards Explainable AI for Unsupervised Network Traffic Analysis

The application of unsupervised learning approaches, and in particular of clustering techniques, represents a powerful exploration means for the analysis of network measurements. Discovering underlying data characteristics, grouping similar measurements together, and identifying eventual patterns of interest are some of the applications which can be tackled through clustering. Being unsupervised, clustering does not always provide precise and clear insight into the produced output, especially when the input data structure and distribution are complex and difficult to grasp. In this paper we introduce EXPLAIN-IT, a methodology which deals with unlabeled data, creates meaningful clusters, and suggests an explanation to the clustering results for the end-user. EXPLAIN-IT relies on a novel explainable Artificial Intelligence (AI) approach, which allows to understand the reasons leading to a particular decision of a supervised learning-based model, additionally extending its application to the unsupervised learning domain. We apply EXPLAIN-IT to the problem of YouTube video quality classification under encrypted traffic scenarios, showing promising results.

preprint2020arXiv

IPPO: A Privacy-Aware Architecture for Decentralized Data-sharing

Online trackers personalize ads campaigns, exponentially increasing their efficacy compared to traditional channels. The downside of this is that thousands of mostly unknown systems own our profiles and violate our privacy without our awareness. IPPO turns the table and re-empower users of their data, through anonymised data publishing via a Blockchain-based Decentralized Data Marketplace. We also propose a service based on machine learning and big data analytics to automatically identify web trackers and build Privacy Labels (PLs), based on the nutrition labels concept. This paper describes the motivation, the vision, the architecture and the research challenges related to IPPO.

preprint2016arXiv

WeBrowse: Mining HTTP logs online for network-based content recommendation

A powerful means to help users discover new content in the overwhelming amount of information available today is sharing in online communities such as social networks or crowdsourced platforms. This means comes short in the case of what we call communities of a place: people who study, live or work at the same place. Such people often share common interests but either do not know each other or fail to actively engage in submitting and relaying information. To counter this effect, we propose passive crowdsourced content discovery, an approach that leverages the passive observation of web-clicks as an indication of users' interest in a piece of content. We design, implement, and evaluate WeBrowse , a passive crowdsourced system which requires no active user engagement to promote interesting content to users of a community of a place. Instead, it extracts the URLs users visit from traffic traversing a network link to identify popular and interesting pieces of information. We first prototype WeBrowse and evaluate it using both ground-truths and real traces from a large European Internet Service Provider. Then, we deploy WeBrowse in a campus of 15,000 users, and in a neighborhood. Evaluation based on our deployments shows the feasibility of our approach. The majority of WeBrowse's users welcome the quality of content it promotes. Finally, our analysis of popular topics across different communities confirms that users in the same community of a place share common interests, compared to users from different communities, thus confirming the promise of WeBrowse's approach.

preprint2015arXiv

CrowdSurf: Empowering Informed Choices in the Web

When surfing the Internet, individuals leak personal and corporate information to third parties whose (legitimate or not) businesses revolve around the value of collected data. The implications are serious, from a person unwillingly exposing private information to an unknown third party, to a company unable to manage the flow of its information to the outside world. The point is that individuals and companies are more and more kept out of the loop when it comes to control private data. With the goal of empowering informed choices in information leakage through the Internet, we propose CROWDSURF, a system for comprehensive and collaborative auditing of data that flows to Internet services. Similarly to open-source efforts, we enable users to contribute in building awareness and control over privacy and communication vulnerabilities. CROWDSURF provides the core infrastructure and algorithms to let individuals and enterprises regain control on the information exposed on the web. We advocate CROWDSURF as a data processing layer positioned right below HTTP in the host protocol stack. This enables the inspection of clear-text data even when HTTPS is deployed and the application of processing rules that are customizable to fit any need. Preliminary results obtained executing a prototype implementation on ISP traffic traces demonstrate the feasibility of CROWDSURF.

preprint2015arXiv

YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes

YouTube relies on a massively distributed Content Delivery Network (CDN) to stream the billions of videos in its catalogue. Unfortunately, very little information about the design of such CDN is available. This, combined with the pervasiveness of YouTube, poses a big challenge for Internet Service Providers (ISPs), which are compelled to optimize end-users' Quality of Experience (QoE) while having no control on the CDN decisions. This paper presents YouLighter, an unsupervised technique to identify changes in the YouTube CDN. YouLighter leverages only passive measurements to cluster co-located identical caches into edge-nodes. This automatically unveils the structure of YouTube's CDN. Further, we propose a new metric, called Constellation Distance, that compares the clustering obtained from two different time snapshots, to pinpoint sudden changes. While several approaches allow comparison between the clustering results from the same dataset, no technique allows to measure the similarity of clusters from different datasets. Hence, we develop a novel methodology, based on the Constellation Distance, to solve this problem. By running YouLighter over 10-month long traces obtained from two ISPs in different countries, we pinpoint both sudden changes in edge-node allocation, and small alterations to the cache allocation policies which actually impair the QoE that the end-users perceive.

Marco Mellia

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

The Potential of Erroneous Outbound Traffic Analysis to Unveil Silent Internal Anomalies

On the Dynamics of Political Discussions on Instagram: A Network Perspective

The Internet with Privacy Policies: Measuring The Web Upon Consent

A Survey on Big Data for Network Traffic Monitoring and Analysis

Campus Traffic and e-Learning during COVID-19 Pandemic

EXPLAIN-IT: Towards Explainable AI for Unsupervised Network Traffic Analysis

IPPO: A Privacy-Aware Architecture for Decentralized Data-sharing

WeBrowse: Mining HTTP logs online for network-based content recommendation

CrowdSurf: Empowering Informed Choices in the Web

YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes