Source author record

Katherine Lee

Katherine Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language Artificial Intelligence astro-ph.GA Cryptography and Security

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Exploring the limits of strong membership inference attacks on large language models

State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training references (e.g., fine-tuning attacks), or on stronger attacks applied to small models and datasets. However, weaker attacks have been shown to be brittle and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges prompt an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA--one of the strongest MIAs--to GPT-2 architectures ranging from 10M to 1B parameters, training references on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in four key ways. While (1) strong MIAs can succeed on pre-trained LLMs, (2) their effectiveness, remains limited (e.g., AUC<0.7) in practical settings. (3) Even when strong MIAs achieve better-than-random AUC, aggregate metrics can conceal substantial per-sample MIA decision instability: due to training randomness, many decisions are so unstable that they are statistically indistinguishable from a coin flip. Finally, (4) the relationship between MIA success and related LLM privacy metrics is not as straightforward as prior work has suggested.

preprint2022arXiv

Deduplicating Training Data Makes Language Models Better

We find that existing language modeling datasets contain many near-duplicate examples and long repetitive substrings. As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data. We develop two tools that allow us to deduplicate training datasets -- for example removing from C4 a single 61 word English sentence that is repeated over 60,000 times. Deduplication allows us to train models that emit memorized text ten times less frequently and require fewer train steps to achieve the same or better accuracy. We can also reduce train-test overlap, which affects over 4% of the validation set of standard datasets, thus allowing for more accurate evaluation. We release code for reproducing our work and performing dataset deduplication at https://github.com/google-research/deduplicate-text-datasets.

preprint2022arXiv

What Does it Mean for a Language Model to Preserve Privacy?

Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this data was collected, this could violate expectations of privacy. Thus there is a growing interest in techniques for training language models that preserve privacy. In this paper, we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.

preprint2020arXiv

WT5?! Training Text-to-Text Models to Explain their Predictions

Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction. In this paper, we leverage the text-to-text framework proposed by Raffel et al.(2019) to train language models to output a natural text explanation alongside their prediction. Crucially, this requires no modifications to the loss function or training and decoding procedures -- we simply train the model to output the explanation after generating the (natural text) prediction. We show that this approach not only obtains state-of-the-art results on explainability benchmarks, but also permits learning from a limited set of labeled explanations and transferring rationalization abilities across datasets. To facilitate reproducibility and future work, we release our code use to train the models.

preprint2012arXiv

Filamentary Star Formation: Observing the Evolution toward Flattened Envelopes

Filamentary structures are ubiquitous from large-scale molecular clouds (few parsecs) to small-scale circumstellar envelopes around Class 0 sources (~1000 AU to ~0.1 pc). In particular, recent observations with the Herschel Space Observatory emphasize the importance of large-scale filaments (few parsecs) and star formation. The small-scale flattened envelopes around Class 0 sources are reminiscent of the large-scale filaments. We propose an observationally derived scenario for filamentary star formation that describes the evolution of filaments as part of the process for formation of cores and circumstellar envelopes. If such a scenario is correct, small-scale filamentary structures (0.1 pc in length) with higher densities embedded in starless cores should exist, although to date almost all the interferometers have failed to observe such structures. We perform synthetic observations of filaments at the prestellar stage by modeling the known Class 0 flattened envelope in L1157 using both the Combined Array for Research in Millimeter-wave Astronomy (CARMA) and the Atacama Large Millimeter/Submillimeter Array (ALMA). We show that with reasonable estimates for the column density through the flattened envelope, the CARMA D-array at 3mm wavelengths is not able to detect such filamentary structure, so previous studies would not have detected them. However, the substructures may be detected with CARMA D+E array at 3 mm and CARMA E array at 1 mm as a result of more appropriate resolution and sensitivity. ALMA is also capable of detecting the substructures and showing the structures in detail compared to the CARMA results with its unprecedented sensitivity. Such detection will confirm the new proposed paradigm of non-spherical star formation.