Source author record

Andrew Lee

Andrew Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-lat Machine Learning Artificial Intelligence astro-ph Computation Computation and Language Computer Vision math.CO Methodology

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

While researchers are finding concepts represented as linear directions in language models, a bag of linear directions fails to capture relational structure. To better understand this dichotomy, we study a model with known linear representations, but trained in a highly structured domain -- the board game Othello. While the model's internal board-state representation is linearly decodable, we find additional structure in the form of tensor product representations (TPRs). We train TPR probes to recover shared structure amongst the linear probes, yielding a factorization into square-embeddings, color-embeddings, and a binding matrix that composes them to construct the model's board-state representation. We find geometric signatures within the weights of our TPR probe that align with the structure of the board, but perhaps more importantly, that the linear probes can be recovered directly from the parameters of our TPR probe. Our findings suggest that directional representations may be projections of more structured underlying representations.

preprint2024arXiv

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks. In this work we study a popular algorithm, direct preference optimization (DPO), and the mechanisms by which it reduces toxicity. Namely, we first study how toxicity is represented and elicited in a pre-trained language model, GPT2-medium. We then apply DPO with a carefully crafted pairwise dataset to reduce toxicity. We examine how the resulting model averts toxic outputs, and find that capabilities learned from pre-training are not removed, but rather bypassed. We use this insight to demonstrate a simple method to un-align the model, reverting it back to its toxic behavior.

preprint2022arXiv

Hypergraph Fuss-Catalan Numbers

The Catalan numbers $C_n$ are an extremely well-studied sequence of numbers that appear as the answer to many combinatorial problems. Two generalizations of these numbers that have been studied are the Fuss-Catalan numbers and the Hypergraph Catalan numbers. In this paper, we study the combination of these, the Hypergraph Fuss-Catalan numbers. We provide some combinatorial interpretations of these numbers, as well as describe their generating function.

preprint2020arXiv

Analyzing and Improving Neural Networks by Generating Semantic Counterexamples through Differentiable Rendering

Even as deep neural networks (DNNs) have achieved remarkable success on vision-related tasks, their performance is brittle to transformations in the input. Of particular interest are semantic transformations that model changes that have a basis in the physical world, such as rotations, translations, changes in lighting or camera pose. In this paper, we show how differentiable rendering can be utilized to generate images that are informative, yet realistic, and which can be used to analyze DNN performance and improve its robustness through data augmentation. Given a differentiable renderer and a DNN, we show how to use off-the-shelf attacks from adversarial machine learning to generate semantic counterexamples -- images where semantic features are changed as to produce misclassifications or misdetections. We validate our approach on DNNs for image classification and object detection. For classification, we show that semantic counterexamples, when used to augment the dataset, (i) improve generalization performance (ii) enhance robustness to semantic transformations, and (iii) transfer between models. Additionally, in comparison to sampling-based semantic augmentation, our technique generates more informative data in a sample efficient manner.

preprint2020arXiv

Exactly computing the tail of the Poisson-Binomial Distribution

We offer ShiftConvolvePoibin, a fast exact method to compute the tail of a Poisson-Binomial distribution (PBD). Our method employs an exponential shift to retain its accuracy when computing a tail probability, and in practice we find that it is immune to the significant relative errors that other methods, exact or approximate, can suffer from when computing very small tail probabilities of the PBD. The accompanying R package is also competitive with the fastest implementations for computing the entire PBD.

preprint2010arXiv

The b quark mass from lattice nonrelativistic QCD

We present the first two-loop calculation of the heavy quark energy shift in lattice nonrelativistic QCD (NRQCD). This calculation allow us to extract a preliminary prediction of $m_b(m_b, n_f = 5) = 4.25(12)$ GeV for the mass of the b quark from lattice NRQCD simulations performed with a lattice of spacing $a=0.12$fm. Our result is an improvement on a previous determination of the b quark mass from unquenched lattice NRQCD simulations, which was limited by the use of one-loop expressions for the energy shift. Our value is in good agreement with recent results of $m_b(m_b) = 4.163(16)$ GeV from QCD sum rules and $m_b(m_b, n_f = 5) = 4.170(25)$ GeV from realistic lattice simulations using highly-improved staggered quarks. We employ a mixed strategy to simplify our calculation. Ghost, gluon and counterterm contributions to the energy shift and mass renormalisation are extracted from quenched high-beta simulations whilst fermionic contributions are calculated using automated lattice perturbation theory. Our results demonstrate the effectiveness of such a strategy.

preprint2009arXiv

Radiative corrections to the m(oving)NRQCD action and heavy-light operators

Rare decays of B mesons, such as B \to K^*γand B\to K^{(*)}\ell^+\ell^- are loop suppressed in the Standard Model and sensitive to new physics. The final state meson in heavy-light decays at large recoil has sizeable momentum in the rest frame of the decaying meson. To reduce the resulting discretization errors we formulate the nonrelativistic heavy quark action in a moving frame. We discuss the perturbative renormalization of the leading order heavy-light operators in the resulting theory which is known as m(oving)NRQCD. We also present radiative corrections to the NRQCD action computed using automated lattice perturbation theory. By combining this technique with high-beta simulations in the weak coupling regime of the theory higher order loop corrections can be calculated very efficiently.

preprint2000arXiv

Intrinsic and Cosmological Signatures in Gamma-Ray Burst Time Profiles: Time Dilation

The time profiles of many gamma-ray bursts consist of distinct pulses, which offers the possibility of characterizing the temporal structure of these bursts using a relatively small set of pulse shape parameters. We have used a pulse decomposition procedure to analyze the Time-to-Spill (TTS) data for all bursts observed by BATSE up through trigger number 2000, in all energy channels for which TTS data is available. We obtain amplitude, rise and decay timescales, a pulse shape parameter, and the fluences of individual pulses in all of the bursts. We investigate the correlations between brightness measures (amplitude and fluence) and timescale measures (pulse width and separation) which may result from cosmological time dilation of bursts, or from intrinsic properties of burst sources or from selection effects. The effects of selection biases are evaluated through simulations. The correlations between these parameters among pulses within individual bursts give a measure of the intrinsic effects while the correlations among bursts could result both from intrinsic and cosmological effects. We find that timescales tend to be shorter in bursts with higher peak fluxes, as expected from cosmological time dilation effects, but also find that there are non-cosmological effects contributing to this inverse correlation. We find that timescales tend to be longer in bursts with higher total fluences, contrary to what is expected from cosmological effects. We also find that peak fluxes and total fluences of bursts are uncorrelated, indicating that they cannot both be good distance indicators for bursts.