Researcher profile

David S. Berman

David S. Berman contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

A path to natural language through tokenisation and transformers

Natural languages exhibit striking regularities in their statistical structure, including notably the emergence of Zipf's and Heaps' laws. Despite this, it remains broadly unclear how these properties relate to the modern tokenisation schemes used in contemporary transformer models. In this note, we analyse the information content (as measured by the Shannon entropy) of various corpora under the assumption of a Zipfian frequency distribution, and derive a closed-form expression for the slot entropy expectation value. We then empirically investigate how byte--pair encoding (BPE) transforms corpus statistics, showing that recursive applications of BPE drive token frequencies toward a Zipfian power law while inducing a characteristic growth pattern in empirical entropy. Utilizing the ability of transformers to learn context dependent token probability distributions, we train language models on corpora tokenised at varying BPE depths, revealing that the model predictive entropies increasingly agree with Zipf-derived predictions as the BPE depth increases. Attention-based diagnostics further indicate that deeper tokenisation reduces local token dependencies, bringing the empirical distribution closer to the weakly dependent (near IID) regime. Together, these results clarify how BPE acts not only as a compression mechanism but also as a statistical transform that reconstructs key informational properties of natural language.

preprint2022arXiv

Machine Learning Calabi-Yau Hypersurfaces

We revisit the classic database of weighted-P4s which admit Calabi-Yau 3-fold hypersurfaces equipped with a diverse set of tools from the machine-learning toolbox. Unsupervised techniques identify an unanticipated almost linear dependence of the topological data on the weights. This then allows us to identify a previously unnoticed clustering in the Calabi-Yau data. Supervised techniques are successful in predicting the topological parameters of the hypersurface from its weights with an accuracy of R^2 > 95%. Supervised learning also allows us to identify weighted-P4s which admit Calabi-Yau hypersurfaces to 100% accuracy by making use of partitioning supported by the clustering behaviour.

preprint2022arXiv

On the Dynamics of Inference and Learning

Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cramér-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.

preprint2022arXiv

Twisted Self-duality

We examine a generalisation of the usual self-duality equations for Yang-Mills theory when the colour space admits a non-trivial involution. This involution allows us to construct a non-trivial twist which may be combined with the Hodge star to form a twisted self-dual curvature. We will construct a simple example of twisted self-duality for $su(2) \oplus su(2)$ gauge theory along with its explicit solutions and then dimensionally reduce from four dimensions to obtain families of non-trivial non-linear equations in lower dimensions. This twisted self-duality constraint will be shown to arise in E_7 exceptional field theory through a Scherk-Schwarz reduction and we will show how an Eguchi-Hanson gravitational instanton also obeys the twisted self-duality condition.

preprint2020arXiv

Reductions of Exceptional Field Theories

Double Field Theory (DFT) and Exceptional Field Theory (EFT), collectively called ExFTs, have proven to be a remarkably powerful new framework for string and M-theory. Exceptional field theories were constructed on a case by case basis as often each EFT has its own idiosyncrasies. Intuitively though, an $E_{n-1(n-1)}$ EFT must be contained in an $E_{n(n)}$ ExFT. In this paper we propose a generalised Kaluza-Klein ansatz to relate different ExFTs. We then discuss in more detail the different aspects of the relationship between various ExFTs including the coordinates, section condition and (pseudo)-Lagrangian densities. For the $E_{8(8)}$ EFT we describe a generalisation of the Mukhi-Papageorgakis mechanism to relate the d = 3 topological term in the $E_{8(8)}$ EFT to a Yang-Mills action in the $E_{7(7)}$ EFT.

preprint2020arXiv

S-duality and the Double Copy

The double copy formalism provides an intriguing connection between gauge theories and gravity. It was first demonstrated in the perturbative context of scattering amplitudes but recently the formalism has been applied to exact classical solutions in gauge theories such as the monopole and instanton. In this paper we will investigate how duality symmetries in the gauge theory double copy to gravity and relate these to solution generating transformations and the action of $Sl(2,R)$ in general relativity.