Researcher profile

Jeremiah Birrell

Jeremiah Birrell contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Information Theoretic Adversarial Training of Large Language Models

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradient-based perturbations in the embedding space, enabling more efficient and expressive attacks. Building on this paradigm, we propose WARDEN, a distributionally robust adversarial training framework for LLMs that dynamically reweights adversarial examples through an f -divergence ambiguity set around the empirical training distribution. Our method optimizes the worst-case adversarial loss within a divergence ball around the empirical data distribution, automatically emphasizing harder adversarial examples. Using the convex dual formulation, the objective reduces to a log-sum-exp form under the KL divergence, with a dynamical parameter controlling the strength of reweighting. This study leads to a new class of information-theoretic objectives that significantly reduce attack success rates while maintaining model utility. Across multiple LLMs and attack settings, WARDEN substantially reduces attack success rates with computational and utility costs comparable to CAT-, CAPO-, and MixAT-based baselines, making it a practical approach for scalable robust alignment.

preprint2025arXiv

Concentration Inequalities for Stochastic Optimization of Unbounded Objective Functions with Application to Denoising Score Matching

We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following key tools: 1) A new form of McDiarmid's inequality that is based on sample-dependent one-component mean-difference bounds and which leads to a novel uniform law of large numbers result for unbounded functions. 2) A new Rademacher complexity bound for families of functions that satisfy an appropriate sample-dependent Lipschitz property, which allows for application to a large class of distributions with unbounded support. As an application of these results, we derive statistical error bounds for denoising score matching (DSM), an application that inherently requires one to consider unbounded objective functions and distributions with unbounded support, even in cases where the data distribution has bounded support. In addition, our results quantify the benefit of sample-reuse in algorithms that employ easily-sampled auxiliary random variables in addition to the training data, e.g., as in DSM, which uses auxiliary Gaussian random variables.

preprint2022arXiv

Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving the tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is currently lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils the local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.

preprint2022arXiv

Structure-preserving GANs

Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences. Our theory shows that we can reduce the discriminator space to its projection on the invariant discriminator space, using the conditional expectation with respect to the sigma-algebra associated to the underlying structure. In addition, we prove that the discriminator space reduction must be accompanied by a careful design of structured generators, as flawed designs may easily lead to a catastrophic "mode collapse" of the learned distribution. We contextualize our framework by building symmetry-preserving GANs for distributions with intrinsic group symmetry, and demonstrate that both players, namely the equivariant generator and invariant discriminator, play important but distinct roles in the learning process. Empirical experiments and ablation studies across a broad range of data sets, including real-world medical imaging, validate our theory, and show our proposed methods achieve significantly improved sample fidelity and diversity -- almost an order of magnitude measured in Fréchet Inception Distance -- especially in the small data regime.

preprint2020arXiv

Langevin equations in the small-mass limit: Higher-order approximations

We study the small-mass (overdamped) limit of Langevin equations for a particle in a potential and/or magnetic field with matrix-valued and state-dependent drift and diffusion. We utilize a bootstrapping argument to derive a hierarchy of approximate equations for the position degrees of freedom that are able to achieve accuracy of order $m^{\ell/2}$ over compact time intervals for any $\ell\in\mathbb{Z}^+$. This generalizes prior derivations of the homogenized equation for the position degrees of freedom in the $m\to 0$ limit, which result in order $m^{1/2}$ approximations. Our results cover bounded forces, for which we prove convergence in $L^p$ norms, and unbounded forces, in which case we prove convergence in probability.

preprint2020arXiv

Quantification of Model Uncertainty on Path-Space via Goal-Oriented Relative Entropy

Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncertainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantities of interest on path-space, such as hitting times and exponentially discounted observables, and develop the corresponding UQ bounds. In addition, our method yields tighter UQ bounds, even in cases where previous relative-entropy-based methods also apply, e.g., for ergodic averages. We illustrate these results with examples from option pricing, non-reversible diffusion processes, stochastic control, semi-Markov queueing models, and expectations and distributions of hitting times.

preprint2020arXiv

Uncertainty Quantification for Markov Processes via Variational Principles and Functional Inequalities

Information-theory based variational principles have proven effective at providing scalable uncertainty quantification (i.e. robustness) bounds for quantities of interest in the presence of nonparametric model-form uncertainty. In this work, we combine such variational formulas with functional inequalities (Poincar{é}, $\log$-Sobolev, Liapunov functions) to derive explicit uncertainty quantification bounds for time-averaged observables, comparing a Markov process to a second (not necessarily Markov) process. These bounds are well-behaved in the infinite-time limit and apply to steady-states of both discrete and continuous-time Markov processes.