Researcher profile

Tom Kempton

Tom Kempton contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text

The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear more probable to a detector language model than human-written text. However, we demonstrate that the token-level signal distinguishing human and machine text is non-uniform across the hidden space of the detector model, and naively averaging likelihood-based token scores across regions with fundamentally different statistical structure, as most detectors do, causes a form of Simpson's paradox: a strong local signal is destroyed by inappropriate aggregation. To correct for this, we introduce a learned local calibration step grounded in Bayesian decision theory. Rather than aggregating raw token scores, we first learn lightweight predictors of the score distributions conditioned on position in hidden space, and aggregate calibrated log-likelihood ratios instead. This single intervention dramatically and consistently improves detection performance across all baseline detectors and all datasets we consider. For example, our calibrated variant of Fast-DetectGPT improves AUROC from $0.63$ to $0.85$ on GPT-5.4 text, and a locally-calibrated DMAP detector we introduce achieves state-of-the-art performance across the board. That said, our central contribution is not a new detector, but a precise diagnosis of a significant cause of under-performance of existing detectors and a principled, modular remedy compatible with any token-averaging pipeline. This will serve as a foundation for the community to build upon, with natural avenues including richer distributional models, improved calibration strategies, and principled ensembling with hidden-space geometry signals via the full Bayes-optimal decision rule.

preprint2015arXiv

Bernoulli Convolutions and 1D Dynamics

We describe a family $ϕ_λ$ of dynamical systems on the unit interval which preserve Bernoulli convolutions. We show that if there are parameter ranges for which these systems are piecewise convex, then the corresponding Bernoulli convolution will be absolutely continuous with bounded density. We study the systems $ϕ_λ$ and give some numerical evidence to suggest values of $λ$ for which $ϕ_λ$ may be piecewise convex.

preprint2015arXiv

The dimension of projections of self-affine sets and measures

Let E be a plane self-affine set defined by affine transformations with linear parts given by matrices with positive entries. We show that if mu is a Bernoulli measure on E with dim_H mu = dim_L mu, where dim_H and dim_L denote Hausdorff and Lyapunov dimensions, then the projection of mu in all but at most one direction has Hausdorff dimension min{dim_H mu,1}. We transfer this result to sets and show that many self-affine sets have projections of dimension min{dim_H E,1} in all but at most one direction.

preprint2013arXiv

Sets of beta-expansions and the Hausdorff Measure of Slices through Fractals

We study natural measures on sets of beta-expansions and on slices through self similar sets. In the setting of beta-expansions, these allow us to better understand the measure of maximal entropy for the random beta-transformation and to reinterpret a result of Lindenstrauss, Peres and Schlag in terms of equidistribution. Each of these applications is relevant to the study of Bernoulli convolutions. In the fractal setting this allows us to understand how to disintegrate Hausdorff measure by slicing, leading to conditions under which almost every slice through a self similar set has positive Hausdorff measure, generalising long known results about almost everywhere values of the Hausdorff dimension.

preprint2012arXiv

Counting Beta Expansions and the Absolute Continuity of Bernoulli Convolutions

We study the typical growth rate of the number of words of length n which can be extended to beta-expansions of x. In the general case we give a lower bound for the growth rate, while in the case that the Bernoulli convolution associated to parameter beta is absolutely continuous we are able to give the growth rate precisely. This gives new necessary and sufficient conditions for the absolute continuity of Bernoulli convolutions.