Source author record

Andreas Terzis

Andreas Terzis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Cryptography and Security cs.CY Distributed, Parallel, and Cluster Computing eess.SY Systems and Control

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Debugging Differential Privacy: A Case Study for Privacy Auditing

Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private schemes. In this case study, we audit a recent open source implementation of a differentially private deep learning algorithm and find, with 99.99999999% confidence, that the implementation does not satisfy the claimed differential privacy guarantee.

preprint2022arXiv

Membership Inference Attacks From First Principles

A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instead be evaluated by computing their true-positive rate at low (e.g., <0.1%) false-positive rates, and find most prior attacks perform poorly when evaluated in this way. To address this we develop a Likelihood Ratio Attack (LiRA) that carefully combines multiple ideas from the literature. Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics.

preprint2022arXiv

Poisoning and Backdooring Contrastive Learning

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

preprint2022arXiv

The Privacy Onion Effect: Memorization is Relative

Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments to study this effect, and understand why it occurs. The existence of this effect has various consequences. For example, it suggests that proposals to defend against memorization without training with rigorous privacy guarantees are unlikely to be effective. Further, it suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.

preprint2022arXiv

Toward Training at ImageNet Scale with Differential Privacy

Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches that help make DP training faster, as well as model types and settings of the training process that tend to work better in the DP setting. Combined, the methods we discuss let us train a Resnet-18 with DP to $47.9\%$ accuracy and privacy parameters $ε= 10, δ= 10^{-6}$. This is a significant improvement over "naive" DP training of ImageNet models, but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. The model we use was pretrained on the Places365 data set as a starting point. We share our code at https://github.com/google-research/dp-imagenet, calling for others to build upon this new baseline to further improve DP at scale.

preprint2021arXiv

Wireless sensor network for in situ soil moisture monitoring

We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland.

preprint2016arXiv

High Frequency Remote Monitoring of Parkinson's Disease via Smartphone: Platform Overview and Medication Response Detection

Objective: The aim of this study is to develop a smartphone-based high-frequency remote monitoring platform, assess its feasibility for remote monitoring of symptoms in Parkinson's disease, and demonstrate the value of data collected using the platform by detecting dopaminergic medication response. Methods: We have developed HopkinsPD, a novel smartphone-based monitoring platform, which measures symptoms actively (i.e. data are collected when a suite of tests is initiated by the individual at specific times during the day), and passively (i.e. data are collected continuously in the background). After data collection, we extract features to assess measures of five key behaviors related to PD symptoms -- voice, balance, gait, dexterity, and reaction time. A random forest classifier is used to discriminate measurements taken after a dose of medication (treatment) versus before the medication dose (baseline). Results: A worldwide study for remote PD monitoring was established using HopkinsPD in July, 2014. This study used entirely remote, online recruitment and installation, demonstrating highly cost-effective scalability. In six months, 226 individuals (121 PD and 105 controls) contributed over 46,000 hours of passive monitoring data and approximately 8,000 instances of structured tests of voice, balance, gait, reaction, and dexterity. To the best of our knowledge, this is the first study to have collected data at such a scale for remote PD monitoring. Moreover, we demonstrate the initial ability to discriminate treatment from baseline with 71.0(+-0.4)% accuracy, which suggests medication response can be monitored remotely via smartphone-based measures.

preprint2014arXiv

Hadoop in Low-Power Processors

In our previous work we introduced a so-called Amdahl blade microserver that combines a low-power Atom processor, with a GPU and an SSD to provide a balanced and energy-efficient system. Our preliminary results suggested that the sequential I/O of Amdahl blades can be ten times higher than that a cluster of conventional servers with comparable power consumption. In this paper we investigate the performance and energy efficiency of Amdahl blades running Hadoop. Our results show that Amdahl blades are 7.7 times and 3.4 times as energy-efficient as the Open Cloud Consortium cluster for a data-intensive and a compute-intensive application, respectively. The Hadoop Distributed Filesystem has relatively poor performance on Amdahl blades because both disk and network I/O are CPU-heavy operations on Atom processors. We demonstrate three effective techniques to reduce CPU consumption and improve performance. However, even with these improvements, the Atom processor is still the system's bottleneck. We revisit Amdahl's law, and estimate that Amdahl blades need four Atom cores to be well balanced for Hadoop tasks.

Andreas Terzis

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Debugging Differential Privacy: A Case Study for Privacy Auditing

Membership Inference Attacks From First Principles

Poisoning and Backdooring Contrastive Learning

The Privacy Onion Effect: Memorization is Relative

Toward Training at ImageNet Scale with Differential Privacy

Wireless sensor network for in situ soil moisture monitoring

High Frequency Remote Monitoring of Parkinson's Disease via Smartphone: Platform Overview and Medication Response Detection

Hadoop in Low-Power Processors