Source author record

Claude Castelluccia

Claude Castelluccia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security cs.CY Networking and Internet Architecture Artificial Intelligence Databases Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

16works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Taking Advice from (Dis)Similar Machines: The Impact of Human-Machine Similarity on Machine-Assisted Decision-Making

Machine learning algorithms are increasingly used to assist human decision-making. When the goal of machine assistance is to improve the accuracy of human decisions, it might seem appealing to design ML algorithms that complement human knowledge. While neither the algorithm nor the human are perfectly accurate, one could expect that their complementary expertise might lead to improved outcomes. In this study, we demonstrate that in practice decision aids that are not complementary, but make errors similar to human ones may have their own benefits. In a series of human-subject experiments with a total of 901 participants, we study how the similarity of human and machine errors influences human perceptions of and interactions with algorithmic decision aids. We find that (i) people perceive more similar decision aids as more useful, accurate, and predictable, and that (ii) people are more likely to take opposing advice from more similar decision aids, while (iii) decision aids that are less similar to humans have more opportunities to provide opposing advice, resulting in a higher influence on people's decisions overall.

preprint2021arXiv

Constrained Differentially Private Federated Learning for Low-bandwidth Devices

Federated learning becomes a prominent approach when different entities want to learn collaboratively a common model without sharing their training data. However, Federated learning has two main drawbacks. First, it is quite bandwidth inefficient as it involves a lot of message exchanges between the aggregating server and the participating entities. This bandwidth and corresponding processing costs could be prohibitive if the participating entities are, for example, mobile devices. Furthermore, although federated learning improves privacy by not sharing data, recent attacks have shown that it still leaks information about the training data. This paper presents a novel privacy-preserving federated learning scheme. The proposed scheme provides theoretical privacy guarantees, as it is based on Differential Privacy. Furthermore, it optimizes the model accuracy by constraining the model learning phase on few selected weights. Finally, as shown experimentally, it reduces the upstream and downstream bandwidth by up to 99.9% compared to standard federated learning, making it practical for mobile systems.

preprint2020arXiv

DESIRE: A Third Way for a European Exposure Notification System Leveraging the best of centralized and decentralized systems

This document presents an evolution of the ROBERT protocol that decentralizes most of its operations on the mobile devices. DESIRE is based on the same architecture than ROBERT but implements major privacy improvements. In particular, it introduces the concept of Private Encounter Tokens, that are secret and cryptographically generated, to encode encounters. In the DESIRE protocol, the temporary Identifiers that are broadcast on the Bluetooth interfaces are generated by the mobile devices providing more control to the users about which ones to disclose. The role of the server is merely to match PETs generated by diagnosed users with the PETs provided by requesting users. It stores minimal pseudonymous data. Finally, all data that are stored on the server are encrypted using keys that are stored on the mobile devices, protecting against data breach on the server. All these modifications improve the privacy of the scheme against malicious users and authority. However, as in the first version of ROBERT, risk scores and notifications are still managed and controlled by the server of the health authority, which provides high robustness, flexibility, and efficacy.

preprint2020arXiv

Techniques d'anonymisation tabulaire : concepts et mise en oeuvre

In this document, we present a state of the art of anonymization techniques for classical tabular datasets. This article is geared towards a general public having some knowledge of mathematics and computer science, but with no need for specific knowledge in anonymization. The objective of this document it to explain anonymization concepts in order to be able to sanitize a dataset and compute reindentification risk. The document contains a large number of examples to help understand the calculations. ----- Dans ce document, nous présentons l'état de l'art des techniques d'anonymisation pour des bases de données classiques (i.e. des tables), à destination d'un public technique ayant une formation universitaire de base en mathématiques et informatique, mais non spécialiste. L'objectif de ce document est d'expliquer les concepts permettant de réaliser une anonymisation de données tabulaires, et de calculer les risques de réidentification. Le document est largement composé d'exemples permettant au lecteur de comprendre comment mettre en oeuvre les calculs.

preprint2016arXiv

MyAdChoices: Bringing Transparency and Control to Online Advertising

The intrusiveness and the increasing invasiveness of online advertising have, in the last few years, raised serious concerns regarding user privacy and Web usability. As a reaction to these concerns, we have witnessed the emergence of a myriad of ad-blocking and anti-tracking tools, whose aim is to return control to users over advertising. The problem with these technologies, however, is that they are extremely limited and radical in their approach: users can only choose either to block or allow all ads. With around 200 million people regularly using these tools, the economic model of the Web ---in which users get content free in return for allowing advertisers to show them ads--- is at serious peril. In this paper, we propose a smart Web technology that aims at bringing transparency to online advertising, so that users can make an informed and equitable decision regarding ad blocking. The proposed technology is implemented as a Web-browser extension and enables users to exert fine-grained control over advertising, thus providing them with certain guarantees in terms of privacy and browsing experience, while preserving the Internet economic model. Experimental results in a real environment demonstrate the suitability and feasibility of our approach, and provide preliminary findings on behavioral targeting from real user browsing profiles.

preprint2016arXiv

MyTrackingChoices: Pacifying the Ad-Block War by Enforcing User Privacy Preferences

Free content and services on the Web are often supported by ads. However, with the proliferation of intrusive and privacy-invasive ads, a significant proportion of users have started to use ad blockers. As existing ad blockers are radical (they block all ads) and are not designed taking into account their economic impact, ad-based economic model of the Web is in danger today. In this paper, we target privacy-sensitive users and provide them with fine-grained control over tracking. Our working assumption is that some categories of web pages (for example, related to health, religion, etc.) are more privacy-sensitive to users than others (education, science, etc.). Therefore, our proposed approach consists in providing users with an option to specify the categories of web pages that are privacy-sensitive to them and block trackers present on such web pages only. As tracking is prevented by blocking network connections of third-party domains, we avoid not only tracking but also third-party ads. Since users will continue receiving ads on web pages belonging to non-sensitive categories, our approach essentially provides a trade-off between privacy and economy. To test the viability of our solution, we implemented it as a Google Chrome extension, named MyTrackingChoices (available on Chrome Web Store). Our real-world experiments with MyTrackingChoices show that the economic impact of ad blocking exerted by privacy-sensitive users can be significantly reduced.

preprint2016arXiv

Near-Optimal Fingerprinting with Constraints

Several recent studies have demonstrated that people show large behavioural uniqueness. This has serious privacy implications as most individuals become increasingly re-identifiable in large datasets or can be tracked while they are browsing the web using only a couple of their attributes, called as their fingerprints. Often, the success of these attacks depend on explicit constraints on the number of attributes learnable about individuals, i.e., the size of their fingerprints. These constraints can be budget as well as technical constraints imposed by the data holder. For instance, Apple restricts the number of applications that can be called by another application on iOS in order to mitigate the potential privacy threats of leaking the list of installed applications on a device. In this work, we address the problem of identifying the attributes (e.g., smartphone applications) that can serve as a fingerprint of users given constraints on the size of the fingerprint. We give the best fingerprinting algorithms in general, and evaluate their effectiveness on several real-world datasets. Our results show that current privacy guards limiting the number of attributes that can be queried about individuals is insufficient to mitigate their potential privacy risks in many practical cases.

preprint2015arXiv

On the Unicity of Smartphone Applications

Prior works have shown that the list of apps installed by a user reveal a lot about user interests and behavior. These works rely on the semantics of the installed apps and show that various user traits could be learnt automatically using off-the-shelf machine-learning techniques. In this work, we focus on the re-identifiability issue and thoroughly study the unicity of smartphone apps on a dataset containing 54,893 Android users collected over a period of 7 months. Our study finds that any 4 apps installed by a user are enough (more than 95% times) for the re-identification of the user in our dataset. As the complete list of installed apps is unique for 99% of the users in our dataset, it can be easily used to track/profile the users by a service such as Twitter that has access to the whole list of installed apps of users. As our analyzed dataset is small as compared to the total population of Android users, we also study how unicity would vary with larger datasets. This work emphasizes the need of better privacy guards against collection, use and release of the list of installed apps.

preprint2014arXiv

Retargeting Without Tracking

Retargeting ads are increasingly prevalent on the Internet as their effectiveness has been shown to outperform conventional targeted ads. Retargeting ads are not only based on users' interests, but also on their intents, i.e. commercial products users have shown interest in. Existing retargeting systems heavily rely on tracking, as retargeting companies need to know not only the websites a user has visited but also the exact products on these sites. They are therefore very intrusive, and privacy threatening. Furthermore, these schemes are still sub-optimal since tracking is partial, and they often deliver ads that are obsolete (because, for example, the targeted user has already bought the advertised product). This paper presents the first privacy-preserving retargeting ads system. In the proposed scheme, the retargeting algorithm is distributed between the user and the advertiser such that no systematic tracking is necessary, more control and transparency is provided to users, but still a lot of targeting flexibility is provided to advertisers. We show that our scheme, that relies on homomorphic encryption, can be efficiently implemented and trivially solves many problems of existing schemes, such as frequency capping and ads freshness.

preprint2013arXiv

When Privacy meets Security: Leveraging personal information for password cracking

Passwords are widely used for user authentication and, despite their weaknesses, will likely remain in use in the foreseeable future. Human-generated passwords typically have a rich structure, which makes them susceptible to guessing attacks. In this paper, we study the effectiveness of guessing attacks based on Markov models. Our contributions are two-fold. First, we propose a novel password cracker based on Markov models, which builds upon and extends ideas used by Narayanan and Shmatikov (CCS 2005). In extensive experiments we show that it can crack up to 69% of passwords at 10 billion guesses, more than all probabilistic password crackers we compared again t. Second, we systematically analyze the idea that additional personal information about a user helps in speeding up password guessing. We find that, on average and by carefully choosing parameters, we can guess up to 5% more passwords, especially when the number of attempts is low. Furthermore, we show that the gain can go up to 30% for passwords that are actually based on personal attributes. These passwords are clearly weaker and should be avoided. Our cracker could be used by an organization to detect and reject them. To the best of our knowledge, we are the first to systematically study the relationship between chosen passwords and users' personal information. We test and validate our results over a wide collection of leaked password databases.

preprint2012arXiv

DREAM: DiffeRentially privatE smArt Metering

This paper presents a new privacy-preserving smart metering system. Our scheme is private under the differential privacy model and therefore provides strong and provable guarantees. With our scheme, an (electricity) supplier can periodically collect data from smart meters and derive aggregated statistics while learning only limited information about the activities of individual households. For example, a supplier cannot tell from a user's trace when he watched TV or turned on heating. Our scheme is simple, efficient and practical. Processing cost is very limited: smart meters only have to add noise to their data and encrypt the results with an efficient stream cipher.

preprint2011arXiv

EphPub: Toward Robust Ephemeral Publishing

The increasing amount of personal and sensitive information disseminated over the Internet prompts commensurately growing privacy concerns. Digital data often lingers indefinitely and users lose its control. This motivates the desire to restrict content availability to an expiration time set by the data owner. This paper presents and formalizes the notion of Ephemeral Publishing (EphPub), to prevent the access to expired content. We propose an efficient and robust protocol that builds on the Domain Name System (DNS) and its caching mechanism. With EphPub, sensitive content is published encrypted and the key material is distributed, in a steganographic manner, to randomly selected and independent resolvers. The availability of content is then limited by the evanescence of DNS cache entries. The EphPub protocol is transparent to existing applications, and does not rely on trusted hardware, centralized servers, or user proactive actions. We analyze its robustness and show that it incurs a negligible overhead on the DNS infrastructure. We also perform a large-scale study of the caching behavior of 900K open DNS resolvers. Finally, we propose Firefox and Thunderbird extensions that provide ephemeral publishing capabilities, as well as a command-line tool to create ephemeral files.

preprint2011arXiv

How Unique and Traceable are Usernames?

Suppose you find the same username on different online services, what is the probability that these usernames refer to the same physical person? This work addresses what appears to be a fairly simple question, which has many implications for anonymity and privacy on the Internet. One possible way of estimating this probability would be to look at the public information associated to the two accounts and try to match them. However, for most services, these information are chosen by the users themselves and are often very heterogeneous, possibly false and difficult to collect. Furthermore, several websites do not disclose any additional public information about users apart from their usernames (e.g., discus- sion forums or Blog comments), nonetheless, they might contain sensitive information about users. This paper explores the possibility of linking users profiles only by looking at their usernames. The intuition is that the probability that two usernames refer to the same physical person strongly depends on the "entropy" of the username string itself. Our experiments, based on crawls of real web services, show that a significant portion of the users' profiles can be linked using their usernames. To the best of our knowledge, this is the first time that usernames are considered as a source of information when profiling users on the Internet.

preprint2011arXiv

One Bad Apple Spoils the Bunch: Exploiting P2P Applications to Trace and Profile Tor Users

Tor is a popular low-latency anonymity network. However, Tor does not protect against the exploitation of an insecure application to reveal the IP address of, or trace, a TCP stream. In addition, because of the linkability of Tor streams sent together over a single circuit, tracing one stream sent over a circuit traces them all. Surprisingly, it is unknown whether this linkability allows in practice to trace a significant number of streams originating from secure (i.e., proxied) applications. In this paper, we show that linkability allows us to trace 193% of additional streams, including 27% of HTTP streams possibly originating from "secure" browsers. In particular, we traced 9% of Tor streams carried by our instrumented exit nodes. Using BitTorrent as the insecure application, we design two attacks tracing BitTorrent users on Tor. We run these attacks in the wild for 23 days and reveal 10,000 IP addresses of Tor users. Using these IP addresses, we then profile not only the BitTorrent downloads but also the websites visited per country of origin of Tor users. We show that BitTorrent users on Tor are over-represented in some countries as compared to BitTorrent users outside of Tor. By analyzing the type of content downloaded, we then explain the observed behaviors by the higher concentration of pornographic content downloaded at the scale of a country. Finally, we present results suggesting the existence of an underground BitTorrent ecosystem on Tor.

preprint2010arXiv

Compromising Tor Anonymity Exploiting P2P Information Leakage

Privacy of users in P2P networks goes far beyond their current usage and is a fundamental requirement to the adoption of P2P protocols for legal usage. In a climate of cold war between these users and anti-piracy groups, more and more users are moving to anonymizing networks in an attempt to hide their identity. However, when not designed to protect users information, a P2P protocol would leak information that may compromise the identity of its users. In this paper, we first present three attacks targeting BitTorrent users on top of Tor that reveal their real IP addresses. In a second step, we analyze the Tor usage by BitTorrent users and compare it to its usage outside of Tor. Finally, we depict the risks induced by this de-anonymization and show that users' privacy violation goes beyond BitTorrent traffic and contaminates other protocols such as HTTP.

preprint2010arXiv

Private Information Disclosure from Web Searches. (The case of Google Web History)

As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world's largest service provider, i.e., Google. First, with the exception of a few services that can only be accessed over HTTPS (e.g., Gmail), we find that many Google services are still vulnerable to simple session hijacking. Next, we present the Historiographer, a novel attack that reconstructs the web search history of Google users, i.e., Google's Web History, even though such a service is supposedly protected from session hijacking by a stricter access control policy. The Historiographer uses a reconstruction technique inferring search history from the personalized suggestions fed by the Google search engine. We validate our technique through experiments conducted over real network traffic and discuss possible countermeasures. Our attacks are general and not only specific to Google, and highlight privacy concerns of mixed architectures using both secure and insecure connections.

Claude Castelluccia

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Taking Advice from (Dis)Similar Machines: The Impact of Human-Machine Similarity on Machine-Assisted Decision-Making

Constrained Differentially Private Federated Learning for Low-bandwidth Devices

DESIRE: A Third Way for a European Exposure Notification System Leveraging the best of centralized and decentralized systems

Techniques d'anonymisation tabulaire : concepts et mise en oeuvre

MyAdChoices: Bringing Transparency and Control to Online Advertising

MyTrackingChoices: Pacifying the Ad-Block War by Enforcing User Privacy Preferences

Near-Optimal Fingerprinting with Constraints

On the Unicity of Smartphone Applications

Retargeting Without Tracking

When Privacy meets Security: Leveraging personal information for password cracking

DREAM: DiffeRentially privatE smArt Metering

EphPub: Toward Robust Ephemeral Publishing

How Unique and Traceable are Usernames?

One Bad Apple Spoils the Bunch: Exploiting P2P Applications to Trace and Profile Tor Users

Compromising Tor Anonymity Exploiting P2P Information Leakage

Private Information Disclosure from Web Searches. (The case of Google Web History)