Researcher profile

Evangelos P. Markatos

Evangelos P. Markatos contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Leveraging Google's Publisher-specific IDs to Detect Website Administration

Digital advertising is the most popular way for content monetization on the Internet. Publishers spawn new websites, and older ones change hands with the sole purpose of monetizing user traffic. In this ever-evolving ecosystem, it is challenging to effectively answer questions such as: Which entities monetize what websites? What categories of websites does an average entity typically monetize on and how diverse are these websites? How has this website administration ecosystem changed across time? In this paper, we propose a novel, graph-based methodology to detect administration of websites on the Web, by exploiting the ad-related publisher-specific IDs. We apply our methodology across the top 1 million websites and study the characteristics of the created graphs of website administration. Our findings show that approximately 90% of the websites are associated each with a single publisher, and that small publishers tend to manage less popular websites. We perform a historical analysis of up to 8 million websites, and find a new, constantly rising number of (intermediary) publishers that control and monetize traffic from hundreds of websites, seeking a share of the ad-market pie. We also observe that over time, websites tend to move from big to smaller administrators.

preprint2022arXiv

Measuring the (Over)use of Service Workers for In-Page Push Advertising Purposes

Rich offline experience, periodic background sync, push notification functionality, network requests control, improved performance via requests caching are only a few of the functionalities provided by the Service Worker (SW) API. This new technology, supported by all major browsers, can significantly improve users' experience by providing the publisher with the technical foundations that would normally require a native application. Albeit the capabilities of this new technique and its important role in the ecosystem of Progressive Web Apps (PWAs), it is still unclear what is their actual purpose on the web, and how publishers leverage the provided functionality in their web applications. In this study, we shed light in the real world deployment of SWs, by conducting the first large scale analysis of the prevalence of SWs in the wild. We see that SWs are becoming more and more popular, with the adoption increased by 26% only within the last 5 months. Surprisingly, besides their fruitful capabilities, we see that SWs are being mostly used for In-Page Push Advertising, in 65.08% of the SWs that connect with 3rd parties. We highlight that this is a relatively new way for advertisers to bypass ad-blockers and render ads on the user's displays natively.

preprint2022arXiv

User Tracking in the Post-cookie Era: How Websites Bypass GDPR Consent to Track Users

During the past few years, mostly as a result of the GDPR and the CCPA, websites have started to present users with cookie consent banners. These banners are web forms where the users can state their preference and declare which cookies they would like to accept, if such option exists. Although requesting consent before storing any identifiable information is a good start towards respecting the user privacy, yet previous research has shown that websites do not always respect user choices. Furthermore, considering the ever decreasing reliance of trackers on cookies and actions browser vendors take by blocking or restricting third-party cookies, we anticipate a world where stateless tracking emerges, either because trackers or websites do not use cookies, or because users simply refuse to accept any. In this paper, we explore whether websites use more persistent and sophisticated forms of tracking in order to track users who said they do not want cookies. Such forms of tracking include first-party ID leaking, ID synchronization, and browser fingerprinting. Our results suggest that websites do use such modern forms of tracking even before users had the opportunity to register their choice with respect to cookies. To add insult to injury, when users choose to raise their voice and reject all cookies, user tracking only intensifies. As a result, users' choices play very little role with respect to tracking: we measured that more than 75% of tracking activities happened before users had the opportunity to make a selection in the cookie consent banner, or when users chose to reject all cookies.

preprint2022arXiv

YouTubers Not madeForKids: Detecting Channels Sharing Inappropriate Videos Targeting Children

In the last years, hundreds of new Youtube channels have been creating and sharing videos targeting children, with themes related to animation, superhero movies, comics, etc. Unfortunately, many of these videos are inappropriate for consumption by their target audience, due to disturbing, violent, or sexual scenes. In this paper, we study YouTube channels found to post suitable or disturbing videos targeting kids in the past. We identify a clear discrepancy between what YouTube assumes and flags as inappropriate content and channel, vs. what is found to be disturbing content and still available on the platform, targeting kids. In particular, we find that almost 60\% of videos that were manually annotated and classified as disturbing by an older study in 2019 (a collection bootstrapped with Elsa and other keywords related to children videos), are still available on YouTube in mid 2021. In the meantime, 44% of channels that uploaded such disturbing videos, have yet to be suspended and their videos to be removed. For the first time in literature, we also study the "madeForKids" flag, a new feature that YouTube introduced in the end of 2019, and compare its application to the channels that shared disturbing videos, as flagged from the previous study. Apparently, these channels are less likely to be set as "madeForKids" than those sharing suitable content. In addition, channels posting disturbing videos utilize their channel features such as keywords, description, topics, posts, etc., to appeal to kids (e.g., using game-related keywords). Finally, we use a collection of such channel and content features to train ML classifiers able to detect, at channel creation time, when a channel will be related to disturbing content uploads. These classifiers can help YouTube moderators reduce such incidences, pointing to potentially suspicious accounts without analyzing actual videos.

preprint2020arXiv

Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask

User data is the primary input of digital advertising, fueling the free Internet as we know it. As a result, web companies invest a lot in elaborate tracking mechanisms to acquire user data that can sell to data markets and advertisers. However, with same-origin policy, and cookies as a primary identification mechanism on the web, each tracker knows the same user with a different ID. To mitigate this, Cookie Synchronization (CSync) came to the rescue, facilitating an information sharing channel between third parties that may or not have direct access to the website the user visits. In the background, with CSync, they merge user data they own, but also reconstruct a user's browsing history, bypassing the same origin policy. In this paper, we perform a first to our knowledge in-depth study of CSync in the wild, using a year-long weblog from 850 real mobile users. Through our study, we aim to understand the characteristics of the CSync protocol and the impact it has on web users' privacy. For this, we design and implement CONRAD, a holistic mechanism to detect CSync events at real time, and the privacy loss on the user side, even when the synced IDs are obfuscated. Using CONRAD, we find that 97% of the regular web users are exposed to CSync: most of them within the first week of their browsing, and the median userID gets leaked, on average, to 3.5 different domains. Finally, we see that CSync increases the number of domains that track the user by a factor of 6.75.