Source author record

Carmela Troncoso

Carmela Troncoso appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security cs.CY Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning Software Engineering

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Synthetic Data -- Anonymisation Groundhog Day

Synthetic data has been advertised as a silver-bullet solution to privacy-preserving data publishing that addresses the shortcomings of traditional anonymisation techniques. The promise is that synthetic data drawn from generative models preserves the statistical properties of the original dataset but, at the same time, provides perfect protection against privacy attacks. In this work, we present the first quantitative evaluation of the privacy gain of synthetic data publishing and compare it to that of previous anonymisation techniques. Our evaluation of a wide range of state-of-the-art generative models demonstrates that synthetic data either does not prevent inference attacks or does not retain data utility. In other words, we empirically show that synthetic data does not provide a better tradeoff between privacy and utility than traditional anonymisation techniques. Furthermore, in contrast to traditional anonymisation, the privacy-utility tradeoff of synthetic data publishing is hard to predict. Because it is impossible to predict what signals a synthetic dataset will preserve and what information will be lost, synthetic data leads to a highly variable privacy gain and unpredictable utility loss. In summary, we find that synthetic data is far from the holy grail of privacy-preserving data publishing.

preprint2020arXiv

DatashareNetwork: A Decentralized Privacy-Preserving Search Engine for Investigative Journalists

Investigative journalists collect large numbers of digital documents during their investigations. These documents can greatly benefit other journalists' work. However, many of these documents contain sensitive information. Hence, possessing such documents can endanger reporters, their stories, and their sources. Consequently, many documents are used only for single, local, investigations. We present DatashareNetwork, a decentralized and privacy-preserving search system that enables journalists worldwide to find documents via a dedicated network of peers. DatashareNetwork combines well-known anonymous authentication mechanisms and anonymous communication primitives, a novel asynchronous messaging system, and a novel multi-set private set intersection protocol (MS-PSI) into a *decentralized peer-to-peer private document search engine*. We prove that DatashareNetwork is secure; and show using a prototype implementation that it scales to thousands of users and millions of documents.

preprint2020arXiv

Decentralized Privacy-Preserving Proximity Tracing

This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contact's identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the user's phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the user's phone. Other users' apps can use data from the server to locally estimate whether the device's owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.

preprint2020arXiv

Measuring Membership Privacy on Aggregate Location Time-Series

While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates. We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-a-vis the utility loss they introduce for various mobility analytics tasks. Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series.

preprint2020arXiv

POTs: Protective Optimization Technologies

Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness frameworks' ability to capture a variety of harms caused by systems. We characterize fairness limitations using concepts from requirements engineering and from social sciences. We show that the focus on algorithms' inputs and outputs misses harms that arise from systems interacting with the world; that the focus on bias and discrimination omits broader harms on populations and their environments; and that relying on service providers excludes scenarios where they are not cooperative or intentionally adversarial. We propose Protective Optimization Technologies (POTs). POTs provide means for affected parties to address the negative impacts of systems in the environment, expanding avenues for political contestation. POTs intervene from outside the system, do not require service providers to cooperate, and can serve to correct, shift, or expose harms that systems impose on populations and their environments. We illustrate the potential and limitations of POTs in two case studies: countering road congestion caused by traffic-beating applications, and recalibrating credit scoring for loan applicants.

preprint2020arXiv

Privacy Engineering Meets Software Engineering. On the Challenges of Engineering Privacy ByDesign

Current day software development relies heavily on the use of service architectures and on agile iterative development methods to design, implement, and deploy systems. These practices result in systems made up of multiple services that introduce new data flows and evolving designs that escape the control of a single designer. Academic privacy engineering literature typically abstracts away such conditions of software production in order to achieve generalizable results. Yet, through a systematic study of the literature, we show that proposed solutions inevitably make assumptions about software architectures, development methods and scope of designer control that are misaligned with current practices. These misalignments are likely to pose an obstacle to operationalizing privacy engineering solutions in the wild. Specifically, we identify important limitations in the approaches that researchers take to design and evaluate privacy enhancing technologies which ripple to proposals for privacy engineering methodologies. Based on our analysis, we delineate research and actions needed to re-align research with practice, changes that serve a precondition for the operationalization of academic privacy results in common software engineering practices.

preprint2020arXiv

Tandem: Securing Keys by Using a Central Server While Preserving Privacy

Users' devices, e.g., smartphones or laptops, are typically incapable of securely storing and processing cryptographic keys. We present Tandem, a novel set of protocols for securing cryptographic keys with support from a central server. Tandem uses one-time-use key-share tokens to preserve users' privacy with respect to a malicious central server. Additionally, Tandem enables users to block their keys if they lose their device, and it enables the server to limit how often an adversary can use an unblocked key. We prove Tandem's security and privacy properties, apply Tandem to attribute-based credentials, and implement a Tandem proof of concept to show that it causes little overhead.

preprint2020arXiv

VoteAgain: A scalable coercion-resistant voting system

The strongest threat model for voting systems considers coercion resistance: protection against coercers that force voters to modify their votes, or to abstain. Existing remote voting systems either do not provide this property; require an expensive tallying phase; or burden users with the need to store cryptographic key material and with the responsibility to deceive their coercers. We propose VoteAgain, a scalable voting scheme that relies on the revoting paradigm to provide coercion resistance. VoteAgain uses a novel deterministic ballot padding mechanism to ensure that coercers cannot see whether a vote has been replaced. This mechanism ensures tallies take quasilinear time, making VoteAgain the first revoting scheme that can handle elections with millions of voters. We prove that VoteAgain provides ballot privacy, coercion resistance, and verifiability; and we demonstrate its scalability using a prototype implementation of all cryptographic primitives.

preprint2017arXiv

Systematizing Decentralization and Privacy: Lessons from 15 Years of Research and Deployments

Decentralized systems are a subset of distributed systems where multiple authorities control different components and no authority is fully trusted by all. This implies that any component in a decentralized system is potentially adversarial. We revise fifteen years of research on decentralization and privacy, and provide an overview of key systems, as well as key insights for designers of future systems. We show that decentralized designs can enhance privacy, integrity, and availability but also require careful trade-offs in terms of system complexity, properties provided, and degree of decentralization. These trade-offs need to be understood and navigated by designers. We argue that a combination of insights from cryptography, distributed systems, and mechanism design, aligned with the development of adequate incentives, are necessary to build scalable and successful privacy-preserving decentralized systems.

preprint2014arXiv

Prolonging the Hide-and-Seek Game: Optimal Trajectory Privacy for Location-Based Services

Human mobility is highly predictable. Individuals tend to only visit a few locations with high frequency, and to move among them in a certain sequence reflecting their habits and daily routine. This predictability has to be taken into account in the design of location privacy preserving mechanisms (LPPMs) in order to effectively protect users when they continuously expose their position to location-based services (LBSs). In this paper, we describe a method for creating LPPMs that are customized for a user's mobility profile taking into account privacy and quality of service requirements. By construction, our LPPMs take into account the sequential correlation across the user's exposed locations, providing the maximum possible trajectory privacy, i.e., privacy for the user's present location, as well as past and expected future locations. Moreover, our LPPMs are optimal against a strategic adversary, i.e., an attacker that implements the strongest inference attack knowing both the LPPM operation and the user's mobility profile. The optimality of the LPPMs in the context of trajectory privacy is a novel contribution, and it is achieved by formulating the LPPM design problem as a Bayesian Stackelberg game between the user and the adversary. An additional benefit of our formal approach is that the design parameters of the LPPM are chosen by the optimization algorithm.

Carmela Troncoso

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Synthetic Data -- Anonymisation Groundhog Day

DatashareNetwork: A Decentralized Privacy-Preserving Search Engine for Investigative Journalists

Decentralized Privacy-Preserving Proximity Tracing

Measuring Membership Privacy on Aggregate Location Time-Series

POTs: Protective Optimization Technologies

Privacy Engineering Meets Software Engineering. On the Challenges of Engineering Privacy ByDesign

Tandem: Securing Keys by Using a Central Server While Preserving Privacy

VoteAgain: A scalable coercion-resistant voting system

Systematizing Decentralization and Privacy: Lessons from 15 Years of Research and Deployments

Prolonging the Hide-and-Seek Game: Optimal Trajectory Privacy for Location-Based Services