Source author record

Scott Alfeld

Scott Alfeld appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security Information Theory math.IT math.OC physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Approximate Data Deletion in Generative Models

Users have the right to have their data deleted by third-party learned systems, as codified by recent legislation such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Such data deletion can be accomplished by full re-training, but this incurs a high computational cost for modern machine learning models. To avoid this cost, many approximate data deletion methods have been developed for supervised learning. Unsupervised learning, in contrast, remains largely an open problem when it comes to (approximate or exact) efficient data deletion. In this paper, we propose a density-ratio-based framework for generative models. Using this framework, we introduce a fast method for approximate data deletion and a statistical test for estimating whether or not training points have been deleted. We provide theoretical guarantees under various learner assumptions and empirically demonstrate our methods across a variety of generative methods.

preprint2022arXiv

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

The right to erasure requires removal of a user's information from data held by organizations, with rigorous interpretations extending to downstream products such as learned models. Retraining from scratch with the particular user's data omitted fully removes its influence on the resulting model, but comes with a high computational cost. Machine "unlearning" mitigates the cost incurred by full retraining: instead, models are updated incrementally, possibly only requiring retraining when approximation errors accumulate. Rapid progress has been made towards privacy guarantees on the indistinguishability of unlearned and retrained models, but current formalisms do not place practical bounds on computation. In this paper we demonstrate how an attacker can exploit this oversight, highlighting a novel attack surface introduced by machine unlearning. We consider an attacker aiming to increase the computational cost of data removal. We derive and empirically investigate a poisoning attack on certified machine unlearning where strategically designed training data triggers complete retraining when removed.

preprint2022arXiv

Training-Time Attacks against k-Nearest Neighbors

Nearest neighbor-based methods are commonly used for classification tasks and as subroutines of other data-analysis methods. An attacker with the capability of inserting their own data points into the training set can manipulate the inferred nearest neighbor structure. We distill this goal to the task of performing a training-set data insertion attack against $k$-Nearest Neighbor classification ($k$NN). We prove that computing an optimal training-time (a.k.a. poisoning) attack against $k$NN classification is NP-Hard, even when $k = 1$ and the attacker can insert only a single data point. We provide an anytime algorithm to perform such an attack, and a greedy algorithm for general $k$ and attacker budget. We provide theoretical bounds and empirically demonstrate the effectiveness and practicality of our methods on synthetic and real-world datasets. Empirically, we find that $k$NN is vulnerable in practice and that dimensionality reduction is an effective defense. We conclude with a discussion of open problems illuminated by our analysis.

preprint2021arXiv

Optimizing Graph Structure for Targeted Diffusion

The problem of diffusion control on networks has been extensively studied, with applications ranging from marketing to controlling infectious disease. However, in many applications, such as cybersecurity, an attacker may want to attack a targeted subgraph of a network, while limiting the impact on the rest of the network in order to remain undetected. We present a model POTION in which the principal aim is to optimize graph structure to achieve such targeted attacks. We propose an algorithm POTION-ALG for solving the model at scale, using a gradient-based approach that leverages Rayleigh quotients and pseudospectrum theory. In addition, we present a condition for certifying that a targeted subgraph is immune to such attacks. Finally, we demonstrate the effectiveness of our approach through experiments on real and synthetic networks.

preprint2015arXiv

Contamination Estimation via Convex Relaxations

Identifying anomalies and contamination in datasets is important in a wide variety of settings. In this paper, we describe a new technique for estimating contamination in large, discrete valued datasets. Our approach considers the normal condition of the data to be specified by a model consisting of a set of distributions. Our key contribution is in our approach to contamination estimation. Specifically, we develop a technique that identifies the minimum number of data points that must be discarded (i.e., the level of contamination) from an empirical data set in order to match the model to within a specified goodness-of-fit, controlled by a p-value. Appealing to results from large deviations theory, we show a lower bound on the level of contamination is obtained by solving a series of convex programs. Theoretical results guarantee the bound converges at a rate of $O(\sqrt{\log(p)/p})$, where p is the size of the empirical data set.

Scott Alfeld

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Approximate Data Deletion in Generative Models

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Training-Time Attacks against k-Nearest Neighbors

Optimizing Graph Structure for Targeted Diffusion

Contamination Estimation via Convex Relaxations