Source author record

Jeremiah Birrell

Jeremiah Birrell appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-ph nucl-th astro-ph.CO math.PR Machine Learning gr-qc Information Theory math-ph math.IT math.MP Artificial Intelligence astro-ph.EP Cryptography and Security hep-th math.CA math.DS math.NA physics.geo-ph

Catalog footprint

What is connected

20works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Information Theoretic Adversarial Training of Large Language Models

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradient-based perturbations in the embedding space, enabling more efficient and expressive attacks. Building on this paradigm, we propose WARDEN, a distributionally robust adversarial training framework for LLMs that dynamically reweights adversarial examples through an f -divergence ambiguity set around the empirical training distribution. Our method optimizes the worst-case adversarial loss within a divergence ball around the empirical data distribution, automatically emphasizing harder adversarial examples. Using the convex dual formulation, the objective reduces to a log-sum-exp form under the KL divergence, with a dynamical parameter controlling the strength of reweighting. This study leads to a new class of information-theoretic objectives that significantly reduce attack success rates while maintaining model utility. Across multiple LLMs and attack settings, WARDEN substantially reduces attack success rates with computational and utility costs comparable to CAT-, CAPO-, and MixAT-based baselines, making it a practical approach for scalable robust alignment.

preprint2025arXiv

Concentration Inequalities for Stochastic Optimization of Unbounded Objective Functions with Application to Denoising Score Matching

We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following key tools: 1) A new form of McDiarmid's inequality that is based on sample-dependent one-component mean-difference bounds and which leads to a novel uniform law of large numbers result for unbounded functions. 2) A new Rademacher complexity bound for families of functions that satisfy an appropriate sample-dependent Lipschitz property, which allows for application to a large class of distributions with unbounded support. As an application of these results, we derive statistical error bounds for denoising score matching (DSM), an application that inherently requires one to consider unbounded objective functions and distributions with unbounded support, even in cases where the data distribution has bounded support. In addition, our results quantify the benefit of sample-reuse in algorithms that employ easily-sampled auxiliary random variables in addition to the training data, e.g., as in DSM, which uses auxiliary Gaussian random variables.

preprint2022arXiv

Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving the tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is currently lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils the local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.

preprint2022arXiv

Structure-preserving GANs

Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences. Our theory shows that we can reduce the discriminator space to its projection on the invariant discriminator space, using the conditional expectation with respect to the sigma-algebra associated to the underlying structure. In addition, we prove that the discriminator space reduction must be accompanied by a careful design of structured generators, as flawed designs may easily lead to a catastrophic "mode collapse" of the learned distribution. We contextualize our framework by building symmetry-preserving GANs for distributions with intrinsic group symmetry, and demonstrate that both players, namely the equivariant generator and invariant discriminator, play important but distinct roles in the learning process. Empirical experiments and ablation studies across a broad range of data sets, including real-world medical imaging, validate our theory, and show our proposed methods achieve significantly improved sample fidelity and diversity -- almost an order of magnitude measured in Fréchet Inception Distance -- especially in the small data regime.

preprint2020arXiv

Langevin equations in the small-mass limit: Higher-order approximations

We study the small-mass (overdamped) limit of Langevin equations for a particle in a potential and/or magnetic field with matrix-valued and state-dependent drift and diffusion. We utilize a bootstrapping argument to derive a hierarchy of approximate equations for the position degrees of freedom that are able to achieve accuracy of order $m^{\ell/2}$ over compact time intervals for any $\ell\in\mathbb{Z}^+$. This generalizes prior derivations of the homogenized equation for the position degrees of freedom in the $m\to 0$ limit, which result in order $m^{1/2}$ approximations. Our results cover bounded forces, for which we prove convergence in $L^p$ norms, and unbounded forces, in which case we prove convergence in probability.

preprint2020arXiv

Quantification of Model Uncertainty on Path-Space via Goal-Oriented Relative Entropy

Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncertainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantities of interest on path-space, such as hitting times and exponentially discounted observables, and develop the corresponding UQ bounds. In addition, our method yields tighter UQ bounds, even in cases where previous relative-entropy-based methods also apply, e.g., for ergodic averages. We illustrate these results with examples from option pricing, non-reversible diffusion processes, stochastic control, semi-Markov queueing models, and expectations and distributions of hitting times.

preprint2020arXiv

Uncertainty Quantification for Markov Processes via Variational Principles and Functional Inequalities

Information-theory based variational principles have proven effective at providing scalable uncertainty quantification (i.e. robustness) bounds for quantities of interest in the presence of nonparametric model-form uncertainty. In this work, we combine such variational formulas with functional inequalities (Poincar{é}, $\log$-Sobolev, Liapunov functions) to derive explicit uncertainty quantification bounds for time-averaged observables, comparing a Markov process to a second (not necessarily Markov) process. These bounds are well-behaved in the infinite-time limit and apply to steady-states of both discrete and continuous-time Markov processes.

preprint2016arXiv

The hot Hagedorn Universe

In the context of the half-centenary of Hagedorn temperature and the statistical bootstrap model (SBM) we present a short account of how these insights coincided with the establishment of the hot big-bang model (BBM) and helped resolve some of the early philosophical difficulties. We then turn attention to the present day context and show the dominance of strong interaction quark and gluon degrees of freedom in the early stage, helping to characterize the properties of the hot Universe. We focus attention on the current experimental insights about cosmic microwave background (CMB) temperature fluctuation, and develop a much improved understanding of the neutrino freeze-out, in this way paving the path to the opening of a direct connection of quark-gluon plasma (QGP) physics in the early Universe with the QCD-lattice, and the study of the properties of QGP formed in the laboratory.

preprint2015arXiv

A Recursive Method for Computing Certain Bessel Function Integrals

We investigate a family of integrals involving modified Bessel functions that arise in the context of neutrino scattering. Recursive formulas are derived for evaluating these integrals and their asymptotic expansions are computed. We prove in certain cases that the asymptotic expansion yields the exact result after a finite number of terms. In each of these cases we derive a formula that bounds the order at which the expansion terminates. The method of calculation developed in this paper is applicable to similar families of integrals that involve Bessel or modified Bessel functions.

preprint2015arXiv

Dynamical Emergence of the Universe into the False Vacuum

We study how the hot Universe evolves and acquires the prevailing vacuum state, demonstrating that in specific conditions which are believed to apply, the Universe becomes frozen into the state with the smallest value of Higgs vacuum field $v=\langle h\rangle$, even if this is not the state of lowest energy. This supports the false vacuum dark energy $Λ$-model. Under several likely hypotheses we determine the temperature in the evolution of the Universe at which two vacuua $v_1, v_2$ can swap between being true and false. We evaluate the dynamical surface pressure on domain walls between low and high mass vaccua due to the presence of matter and show that the low mass state remains the preferred vacuum of the Universe.

preprint2015arXiv

Proposal for Resonant Detection of Relic Massive Neutrinos

We present a novel method for detecting the relic neutrino background that takes advantage of structured quantum degeneracy to amplify the drag force from neutrinos scattering off a detector. Developing this idea, we present a characterization of the present day relic neutrino distribution in an arbitrary frame, including the influence of neutrino mass and neutrino reheating by $e^+e^-$ annihilation. We present explicitly the neutrino velocity and de Broglie wavelength distributions for the case of an Earthbound observer. Considering that relic neutrinos could exhibit quantum liquid features at the present day temperature and density, we discuss the impact of neutrino fluid correlations on the possibility of resonant detection.

preprint2014arXiv

Boltzmann Equation Solver Adapted to Emergent Chemical Non-equilibrium

We present a novel method to solve the spatially homogeneous and isotropic relativistic Boltzmann equation. We employ a basis set of orthogonal polynomials dynamically adapted to allow for emergence of chemical non-equilibrium. Two time dependent parameters characterize the set of orthogonal polynomials, the effective temperature $T(t)$ and phase space occupation factor $Υ(t)$. In this first paper we address (effectively) massless fermions and derive dynamical equations for $T(t)$ and $Υ(t)$ such that the zeroth order term of the basis alone captures the particle number density and energy density of each particle distribution. We validate our method and illustrate the reduced computational cost and the ability to easily represent final state chemical non-equilibrium by studying a model problem that is motivated by the physics of the neutrino freeze-out processes in the early Universe, where the essential physical characteristics include reheating from another disappearing particle component ($e^\pm$-annihilation).

preprint2014arXiv

Non-Equilibrium Aspects of Relic Neutrinos: From Freeze-out to the Present Day

In this dissertation, we study the evolution and properties of the relic (or cosmic) neutrino distribution from neutrino freeze-out at $T=O(1)$ MeV through the free-streaming era up to today, focusing on the deviation of the neutrino spectrum from equilibrium. In particular, we demonstrate the presence of chemical non-equilibrium that continues to the present day. The work naturally separates into two parts. The first focuses on aspects of the relic neutrinos that can be explored using conservation laws. The second part studies the neutrino distribution using the full general relativistic Boltzmann equation. Part one begins with an overview of the history of the Universe, from just prior to neutrino freeze-out up through the present day, placing the history of cosmic neutrino evolution in its proper context. Motivated by the Planck CMB measurements of the effective number of neutrinos, we derive those properties of neutrino freeze-out that depend only on conservation laws and are independent of the details of the scattering processes. Part one ends with a characterization of the present day neutrino spectrum as seen from Earth. The second part of this dissertation focuses on the properties of cosmic neutrinos that depend on the details of the neutrino reactions, as is necessary for modeling the non-thermal distortions from equilibrium and computing freeze-out temperatures. We detail a new spectral method for solving the Boltzmann equation, based on a dynamical basis of orthogonal polynomials, as well as an improved procedure for analytically simplifying the corresponding scattering integrals for subsequent numerical computation. We apply these novel solution methods to solve the Boltzmann equation through the neutrino freeze-out period and perform parametric studies of the dependence of neutrino freeze-out on standard model parameters.

preprint2014arXiv

Relic Neutrino Freeze-out: Dependence on Natural Constants

Analysis of cosmic microwave background radiation fluctuations favors an effective number of neutrinos, $N_ν>3$. This motivates a reinvestigation of the neutrino freeze-out process. Here we characterize the dependence of $N_ν$ on the Standard Model (SM) parameters that govern neutrino freeze-out. We show that $N_ν$ depends on a combination $η$ of several natural constants characterizing the relative strength of weak interaction processes in the early Universe and on the Weinberg angle $\sin^2θ_W$. We determine numerically the dependence $N_ν(η,\sin^2θ_W)$ and discuss these results. The extensive numerical computations are made possible by two novel numerical procedures: a spectral method Boltzmann equation solver adapted to allow emerging chemical non-equilibrium, and a method to evaluate Boltzmann equation collision integrals that generates a smooth integrand.

preprint2013arXiv

Compact Ultra Dense Matter Impactors

We study interactions of meteorlike compact ultradense objects (CUDO), having nuclear or greater density, with Earth and other rocky bodies in the Solar System as a possible source of information about novel forms of matter. We study the energy loss in CUDO puncture of the body and discuss differences between regular matter and CUDO impacts.

preprint2013arXiv

Fugacity and Reheating of Primordial Neutrinos

We clarify in a quantitative way the impact that distinct chemical $T_c$ and kinetic $T_k$ freeze-out temperatures have on the reduction of the neutrino fugacity $Υ_ν$ below equilibrium, i.e. $Υ_ν<1$, and the increase of the neutrino temperature $T_ν$ via partial reheating. We establish the connection between $Υ_ν$ and $T_k$ via the modified reheating relation $T_ν(Υ_ν)/T_γ$, where $T_γ$ is the temperature of the background radiation. Our results demonstrate that one must introduce the chemical nonequilibrium parameter, i.e., the fugacity, $Υ_ν$, as an additional standard cosmological model parameter in the evaluation of CMB fluctuations as its value allows measurement of $T_k$.

preprint2013arXiv

Relic neutrinos: Physically consistent treatment of effective number of neutrinos and neutrino mass

We perform a model independent study of the neutrino momentum distribution at freeze-out, treating the freeze-out temperature as a free parameter. Our results imply that measurement of neutrino reheating, as characterized by the measurement of the effective number of neutrinos $N_ν$, amounts to the determination of the neutrino kinetic freeze-out temperature within the context of the standard model of particle physics where the number of neutrino flavors is fixed and no other massless (fractional) particles arise. At temperatures on the order of the neutrino mass, we show how cosmic background neutrino properties i.e. energy density, pressure, particle density, are modified in a physically consistent way as a function of neutrino mass and $N_ν$.

preprint2013arXiv

Traveling Through the Universe: Back in Time to the Quark-Gluon Plasma Era

We survey the early history of the discovery of quark gluon plasma and the early history of the Universe, beginning with the present day and reaching deep into QGP and almost beyond. We introduce cosmological Universe dynamics and connect the different Universe epochs with one another. We describe some of the many remaining open questions that emerge.

preprint2012arXiv

Possibility of Electroweak Phase Transition at Low Temperature

We study models of strong first order `low' temperature electroweak phase transition. To achieve this we propose a class of Higgs effective potential models which preserve the known features of the present day massive phase. However, the properties of the symmetry restored massless phase are modified in a way that for a large parameter domain we find a strong first order transition occurring at a temperature hundreds of times lower than previously considered possible.

preprint2011arXiv

Transition from ergodic to explosive behavior in a family of stochastic differential equations

We study a family of quadratic stochastic differential equations in the plane, motivated by applications to turbulent transport of heavy particles. Using Lyapunov functions, we find a critical parameter value $α_{1}=α_{2}$ such that when $α_{2}>α_{1}$ the system is ergodic and when $α_{2}<α_{1}$ solutions are not defined for all times. Hörmander's hypoellipticity theorem and geometric control theory are also utilized.

Jeremiah Birrell

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Information Theoretic Adversarial Training of Large Language Models

Concentration Inequalities for Stochastic Optimization of Unbounded Objective Functions with Application to Denoising Score Matching

Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

Structure-preserving GANs

Langevin equations in the small-mass limit: Higher-order approximations

Quantification of Model Uncertainty on Path-Space via Goal-Oriented Relative Entropy

Uncertainty Quantification for Markov Processes via Variational Principles and Functional Inequalities

The hot Hagedorn Universe

A Recursive Method for Computing Certain Bessel Function Integrals

Dynamical Emergence of the Universe into the False Vacuum

Proposal for Resonant Detection of Relic Massive Neutrinos

Boltzmann Equation Solver Adapted to Emergent Chemical Non-equilibrium

Non-Equilibrium Aspects of Relic Neutrinos: From Freeze-out to the Present Day

Relic Neutrino Freeze-out: Dependence on Natural Constants

Compact Ultra Dense Matter Impactors

Fugacity and Reheating of Primordial Neutrinos

Relic neutrinos: Physically consistent treatment of effective number of neutrinos and neutrino mass

Traveling Through the Universe: Back in Time to the Quark-Gluon Plasma Era

Possibility of Electroweak Phase Transition at Low Temperature

Transition from ergodic to explosive behavior in a family of stochastic differential equations