Source author record

Kurt Thomas

Kurt Thomas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Artificial Intelligence Machine Learning

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DROIDCCT: Cryptographic Compliance Test via Trillion-Scale Measurement

We develop DroidCCT, a distributed test framework to evaluate the scale of a wide range of failures/bugs in cryptography for end users. DroidCCT relies on passive analysis of artifacts from the execution of cryptographic operations in the Android ecosystem to identify weak implementations. We collect trillions of samples from cryptographic operations of Android Keystore on half a billion devices and apply severalanalysis techniques to evaluate the quality of cryptographic output from these devices and their underlying implementations. Our study reveals several patterns of bugs and weakness in cryptographic implementations from various manufacturers and chipsets. We show that the heterogeneous nature of cryptographic implementations results in non-uniform availability and reliability of various cryptographic functions. More importantly, flaws such as the use of weakly-generated random parameters, and timing side channels may surface across deployments of cryptography. Our results highlight the importance of fault- and side-channel-resistant cryptography and the ability to transparently and openly test these implementations.

preprint2026arXiv

ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete security impact, such as unauthorized file access or code execution. Exploitation is a particularly challenging task because it requires low-level program reasoning (e.g., about memory layout), runtime adaptation, and sustained progress over long horizons. Meanwhile, it is inherently dual-use, supporting defensive workflows while lowering the barrier for offense. Despite its importance and diagnostic value, exploitation remains under-evaluated. To address this gap, we introduce ExploitGym, a large-scale, diverse, realistic benchmark on the exploitation capabilities of AI agents. Given a program input that triggers a vulnerability, ExploitGym tasks agents with progressively extending it into a working exploit. The benchmark comprises 898 instances sourced from real-world vulnerabilities across three domains, including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. We vary the security protections applied to each instance, isolating their impact on agent performance. All configurations are packaged in reproducible containerized environments. Our evaluation shows that while exploitation remains challenging, frontier models can successfully exploit a non-trivial fraction of vulnerabilities. For example, the strongest configurations are Anthropic's latest model Claude Mythos Preview and OpenAI's GPT-5.5, which produce working exploits for 157 and 120 instances, respectively. Notably, even with widely used defenses enabled, models retain non-trivial success rates. These results establish ExploitGym as an effective testbed for exploitation and highlight the growing cybersecurity risks posed by increasingly capable AI agents.

preprint2026arXiv

Understanding Help Seeking for Digital Privacy, Safety, and Security

The complexity of navigating digital privacy, safety, and security threats often falls directly on users. This leads to users seeking help from family and peers, platforms and advice guides, dedicated communities, and even large language models (LLMs). As a precursor to improving resources across this ecosystem, our community needs to understand what help seeking looks like in the wild. To that end, we blend qualitative coding with LLM fine-tuning to sift through over one billion Reddit posts from the last four years to identify where and for what users seek digital privacy, safety, or security help. We isolate three million relevant posts with 93% precision and recall and automatically annotate each with the topics discussed (e.g., security tools, privacy configurations, scams, account compromise, content moderation, and more). We use this dataset to understand the scope and scale of help seeking, the communities that provide help, and the types of help sought. Our work informs the development of better resources for users (e.g., user guides or LLM help-giving agents) while underscoring the inherent challenges of supporting users through complex combinations of threats, platforms, mitigations, context, and emotions.

Kurt Thomas

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

DROIDCCT: Cryptographic Compliance Test via Trillion-Scale Measurement

ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

Understanding Help Seeking for Digital Privacy, Safety, and Security