Source author record

Úlfar Erlingsson

Úlfar Erlingsson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Artificial Intelligence Data Structures and Algorithms

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a user's value. More fundamentally---by building on anonymity of the users' reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying $\varepsilon$-local differential privacy will satisfy $(O(\varepsilon \sqrt{\log(1/δ)/n}), δ)$-central differential privacy. By this, we explain how the high noise and $\sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised $\varepsilon$ would indicate---at least if reports are anonymized.

preprint2020arXiv

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.

preprint2020arXiv

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to less than ideal privacy/utility tradeoffs, as we show here. Instead, we propose that model architectures are chosen ab initio explicitly for privacy-preserving training. To provide guarantees under the gold standard of differential privacy, one must bound as strictly as possible how individual training points can possibly affect model updates. In this paper, we are the first to observe that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning. We demonstrate analytically and experimentally how a general family of bounded activation functions, the tempered sigmoids, consistently outperform unbounded activation functions like ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals or differential privacy analysis.

preprint2020arXiv

That which we call private

The guarantees of security and privacy defenses are often strengthened by relaxing the assumptions made about attackers or the context in which defenses are deployed. Such relaxations can be a highly worthwhile topic of exploration---even though they typically entail assuming a weaker, less powerful adversary---because there may indeed be great variability in both attackers' powers and their context. However, no weakening or contextual discounting of attackers' power is assumed for what some have called "relaxed definitions" in the analysis of differential-privacy guarantees. Instead, the definitions so named are the basis of refinements and more advanced analyses of the worst-case implications of attackers---without any change assumed in attackers' powers. Because they more precisely bound the worst-case privacy loss, these improved analyses can greatly strengthen the differential-privacy upper-bound guarantees---sometimes lowering the differential-privacy epsilon by orders-of-magnitude. As such, to the casual eye, these analyses may appear to imply a reduced privacy loss. This is a false perception: the privacy loss of any concrete mechanism cannot change with the choice of a worst-case-loss upper-bound analysis technique. Practitioners must be careful not to equate real-world privacy with differential-privacy epsilon values, at least not without full consideration of the context.

preprint2016arXiv

Data-driven software security: Models and methods

For computer software, our security models, policies, mechanisms, and means of assurance were primarily conceived and developed before the end of the 1970's. However, since that time, software has changed radically: it is thousands of times larger, comprises countless libraries, layers, and services, and is used for more purposes, in far more complex ways. It is worthwhile to revisit our core computer security concepts. For example, it is unclear whether the Principle of Least Privilege can help dictate security policy, when software is too complex for either its developers or its users to explain its intended behavior. This paper outlines a data-driven model for software security that takes an empirical, data-driven approach to modern software, and determines its exact, concrete behavior via comprehensive, online monitoring. Specifically, this paper briefly describes methods for efficient, detailed software monitoring, as well as methods for learning detailed software statistics while providing differential privacy for its users, and, finally, how machine learning methods can help discover users' expectations for intended software behavior, and thereby help set security policy. Those methods can be adopted in practice, even at very large scales, and demonstrate that data-driven software security models can provide real-world benefits.

preprint2015arXiv

Apples and Oranges: Detecting Least-Privilege Violators with Peer Group Analysis

Clustering software into peer groups based on its apparent functionality allows for simple, intuitive categorization of software that can, in particular, help identify which software uses comparatively more privilege than is necessary to implement its functionality. Such relative comparison can improve the security of a software ecosystem in a number of ways. For example, it can allow market operators to incentivize software developers to adhere to the principle of least privilege, e.g., by encouraging users to use alternative, less-privileged applications for any desired functionality. This paper introduces software peer group analysis, a novel technique to identify least privilege violation and rank software based on the severity of the violation. We show that peer group analysis is an effective tool for detecting and estimating the severity of least privilege violation. It provides intuitive, meaningful results, even across different definitions of peer groups and security-relevant privileges. Our evaluation is based on empirically applying our analysis to over a million software items, in two different online software markets, and on a validation of our assumptions in a medium-scale user study.

preprint2015arXiv

Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries

Techniques based on randomized response enable the collection of potentially sensitive data from clients in a privacy-preserving manner with strong local differential privacy guarantees. One of the latest such technologies, RAPPOR, allows the marginal frequencies of an arbitrary set of strings to be estimated via privacy-preserving crowdsourcing. However, this original estimation process requires a known set of possible strings; in practice, this dictionary can often be extremely large and sometimes completely unknown. In this paper, we propose a novel decoding algorithm for the RAPPOR mechanism that enables the estimation of "unknown unknowns," i.e., strings we do not even know we should be estimating. To enable learning without explicit knowledge of the dictionary, we develop methodology for estimating the joint distribution of two or more variables collected with RAPPOR. This is a critical step towards understanding relationships between multiple variables collected in a privacy-preserving manner.

Úlfar Erlingsson

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

That which we call private

Data-driven software security: Models and methods

Apples and Oranges: Detecting Least-Privilege Violators with Peer Group Analysis

Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries