Researcher profile

Ananth Raghunathan

Ananth Raghunathan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2020arXiv

Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a user's value. More fundamentally---by building on anonymity of the users' reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying $\varepsilon$-local differential privacy will satisfy $(O(\varepsilon \sqrt{\log(1/δ)/n}), δ)$-central differential privacy. By this, we explain how the high noise and $\sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised $\varepsilon$ would indicate---at least if reports are anonymized.

preprint2020arXiv

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.

preprint2020arXiv

Information Leakage in Embedding Models

Embeddings are functions that map raw input data to low-dimensional vector representations, while preserving important semantic information about the inputs. Pre-training embeddings on a large amount of unlabeled data and fine-tuning them for downstream tasks is now a de facto standard in achieving state of the art learning in many domains. We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data. We develop three classes of attacks to systematically study information that might be leaked by embeddings. First, embedding vectors can be inverted to partially recover some of the input data. As an example, we show that our attacks on popular sentence embeddings recover between 50\%--70\% of the input words (F1 scores of 0.5--0.7). Second, embeddings may reveal sensitive attributes inherent in inputs and independent of the underlying semantic task at hand. Attributes such as authorship of text can be easily extracted by training an inference model on just a handful of labeled embedding vectors. Third, embedding models leak moderate amount of membership information for infrequent training data inputs. We extensively evaluate our attacks on various state-of-the-art embedding models in the text domain. We also propose and evaluate defenses that can prevent the leakage to some extent at a minor cost in utility.

preprint2020arXiv

That which we call private

The guarantees of security and privacy defenses are often strengthened by relaxing the assumptions made about attackers or the context in which defenses are deployed. Such relaxations can be a highly worthwhile topic of exploration---even though they typically entail assuming a weaker, less powerful adversary---because there may indeed be great variability in both attackers' powers and their context. However, no weakening or contextual discounting of attackers' power is assumed for what some have called "relaxed definitions" in the analysis of differential-privacy guarantees. Instead, the definitions so named are the basis of refinements and more advanced analyses of the worst-case implications of attackers---without any change assumed in attackers' powers. Because they more precisely bound the worst-case privacy loss, these improved analyses can greatly strengthen the differential-privacy upper-bound guarantees---sometimes lowering the differential-privacy epsilon by orders-of-magnitude. As such, to the casual eye, these analyses may appear to imply a reduced privacy loss. This is a false perception: the privacy loss of any concrete mechanism cannot change with the choice of a worst-case-loss upper-bound analysis technique. Practitioners must be careful not to equate real-world privacy with differential-privacy epsilon values, at least not without full consideration of the context.