Source author record

Vitaly Shmatikov

Vitaly Shmatikov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Computer Vision Computation and Language Artificial Intelligence cs.CY Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Neural and Evolutionary Computing

Catalog footprint

What is connected

10works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents

Agents operating with computer and Web use inevitably encounter errors: inaccessible webpages, missing files, local and remote misconfigurations, etc. These errors do not thwart agents based on state-of-the-art models. They helpfully continue to look for ways to complete their tasks. We introduce, characterize, and measure a new type of agent failure we call \emph{accidental meltdown}: unsafe or harmful behavior in response to a benign environmental error, in the absence of any adversarial inputs. Because meltdowns are not captured by the existing reliability or safety benchmarks, we develop a taxonomy of meltdown behaviors. We then implement an agent-agnostic infrastructure for injecting simulated local and remote errors into the rollout environment and use it to systematically evaluate agent systems powered by GPT, Grok, and Gemini. Our evaluation demonstrates that meltdowns (e.g., conducting unauthorized reconnaissance or subverting access control) of varying severity and success occur in 64.7\% of agent rollouts that encounter simulated errors, spanning all combinations of agent system, backing model, and error type. In over half of these meltdowns, unsafe behaviors are not reported to the user. Comparing behaviors of the same agents with and without errors, we find that exploration in response to errors is correlated with unsafe and harmful behavior.

preprint2022arXiv

Salvaging Federated Learning by Local Adaptation

Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the fedeated model, latest FL approaches use differential privacy or robust aggregation. We look at FL from the \emph{local} viewpoint of an individual participant and ask: (1) do participants have an incentive to participate in FL? (2) how can participants \emph{individually} improve the quality of their local models, without re-designing the FL framework and/or involving other participants? First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model\textemdash and who have no incentive to participate in FL today\textemdash improve less, but sufficiently to make the adapted federated model better than their local models.

preprint2021arXiv

Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. The attack code creates poisoned training inputs "on the fly," as the model is training, and uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. We show how a blind attack can evade any known defense and propose new ones.

preprint2020arXiv

Can we still avoid automatic face detection?

After decades of study, automatic face detection and recognition systems are now accurate and widespread. Naturally, this means users who wish to avoid automatic recognition are becoming less able to do so. Where do we stand in this cat-and-mouse race? We currently live in a society where everyone carries a camera in their pocket. Many people willfully upload most or all of the pictures they take to social networks which invest heavily in automatic face recognition systems. In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition? If so, how? Must evasion techniques be obvious to be effective, or are there still simple measures that users can use to protect themselves? In this work, we find ways to evade face detection on Facebook, a representative example of a popular social network that uses automatic face detection to enhance their service. We challenge widely-held beliefs about evading face detection: do our old techniques such as blurring the face region or wearing "privacy glasses" still work? We show that in general, state-of-the-art detectors can often find faces even if the subject wears occluding clothing or even if the uploader damages the photo to prevent faces from being detected.

preprint2020arXiv

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word "meaning" in the sense that distances between words' vectors correspond to their semantic proximity. This enables transfer learning of semantics for a variety of natural language processing tasks. Word embeddings are typically trained on large public corpora such as Wikipedia or Twitter. We demonstrate that an attacker who can modify the corpus on which the embedding is trained can control the "meaning" of new and existing words by changing their locations in the embedding space. We develop an explicit expression over corpus features that serves as a proxy for distance between words and establish a causative relationship between its values and embedding distances. We then show how to use this relationship for two adversarial objectives: (1) make a word a top-ranked neighbor of another word, and (2) move a word from one semantic cluster to another. An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios. We use this attack to manipulate query expansion in information retrieval systems such as resume search, make certain names more or less visible to named entity recognition models, and cause new words to be translated to a particular target word regardless of the language. Finally, we show how the attacker can generate linguistically likely corpus modifications, thus fooling defenses that attempt to filter implausible sentences from the corpus using a language model.

preprint2020arXiv

Overlearning Reveals Sensitive Attributes

"Overlearning" means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective. For example, a binary gender classifier of facial images also learns to recognize races\textemdash even races that are not represented in the training data\textemdash and identities. We demonstrate overlearning in several vision and NLP models and analyze its harmful consequences. First, inference-time representations of an overlearned model reveal sensitive attributes of the input, breaking privacy protections such as model partitioning. Second, an overlearned model can be "re-purposed" for a different, privacy-violating task even in the absence of the original training data. We show that overlearning is intrinsic for some tasks and cannot be prevented by censoring unwanted attributes. Finally, we investigate where, when, and why overlearning happens during model training.

preprint2016arXiv

Defeating Image Obfuscation with Deep Learning

We demonstrate that modern image recognition methods based on artificial neural networks can recover hidden information from images protected by various forms of obfuscation. The obfuscation techniques considered in this paper are mosaicing (also known as pixelation), blurring (as used by YouTube), and P3, a recently proposed system for privacy-preserving photo sharing that encrypts the significant JPEG coefficients to make images unrecognizable by humans. We empirically show how to train artificial neural networks to successfully identify faces and recognize objects and handwritten digits even if the images are protected using any of the above obfuscation techniques.

preprint2016arXiv

Gone in Six Characters: Short URLs Considered Harmful for Cloud Services

Modern cloud services are designed to encourage and support collaboration. To help users share links to online documents, maps, etc., several services, including cloud storage providers such as Microsoft OneDrive and mapping services such as Google Maps, directly integrate URL shorteners that convert long, unwieldy URLs into short URLs, consisting of a domain such as 1drv.ms or goo.gl and a short token. In this paper, we demonstrate that the space of 5- and 6-character tokens included in short URLs is so small that it can be scanned using brute-force search. Therefore, all online resources that were intended to be shared with a few trusted friends or collaborators are effectively public and can be accessed by anyone. This leads to serious security and privacy vulnerabilities. In the case of cloud storage, we focus on Microsoft OneDrive. We show how to use short-URL enumeration to discover and read shared content stored in the OneDrive cloud, including even files for which the user did not generate a short URL. 7% of the OneDrive accounts exposed in this fashion allow anyone to write into them. Since cloud-stored files are automatically copied into users' personal computers and devices, this is a vector for large-scale, automated malware injection. In the case of online maps, we show how short-URL enumeration reveals the directions that users shared with each other. For many individual users, this enables inference of their residential addresses, true identities, and extremely sensitive locations they visited that, if publicly revealed, would violate medical and financial privacy.

preprint2016arXiv

Towards a Privacy Research Roadmap for the Computing Community

Great advances in computing and communication technology are bringing many benefits to society, with transformative changes and financial opportunities being created in health care, transportation, education, law enforcement, national security, commerce, and social interactions. Many of these benefits, however, involve the use of sensitive personal data, and thereby raise concerns about privacy. Failure to address these concerns can lead to a loss of trust in the private and public institutions that handle personal data, and can stifle the independent thought and expression that is needed for our democracy to flourish. This report, sponsored by the Computing Community Consortium (CCC), suggests a roadmap for privacy research over the next decade, aimed at enabling society to appropriately control threats to privacy while enjoying the benefits of information technology and data science. We hope that it will be useful to the agencies of the Federal Networking and Information Technology Research and Development (NITRD) Program as they develop a joint National Privacy Research Strategy over the coming months. The report synthesizes input drawn from the privacy and computing communities submitted to both the CCC and NITRD, as well as past reports on the topic.

preprint2009arXiv

De-anonymizing Social Networks

Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.

Vitaly Shmatikov

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents

Salvaging Federated Learning by Local Adaptation

Blind Backdoors in Deep Learning Models

Can we still avoid automatic face detection?

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Overlearning Reveals Sensitive Attributes

Defeating Image Obfuscation with Deep Learning

Gone in Six Characters: Short URLs Considered Harmful for Cloud Services

Towards a Privacy Research Roadmap for the Computing Community

De-anonymizing Social Networks