Researcher profile

Jonathan Roberts

Jonathan Roberts contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers

Frontier LLMs are increasingly utilised across academia, society and industry. A commonly used unit for comparing models, their inputs and outputs, and estimating inference pricing is the token. In general, tokens are used as a stable currency, assumed to be broadly consistent across tokenizers and contexts, enabling direct comparisons. However, tokenization varies significantly across models and domains of text, making naive interpretation of token counts problematic. We quantify this variation by providing a comprehensive empirical analysis of tokenization, exploring the compression of sequences to tokens across different distributions of textual data. Our analysis challenges commonly held heuristics about token lengths, finding them to be overly simplistic. We hope the insights of our study add clarity and intuition toward tokenization in contemporary LLMs.

preprint2020arXiv

Composition and Configuration Patterns in Multiple-View Visualizations

Multiple-view visualization (MV) is a layout design technique often employed to help users see a large number of data attributes and values in a single cohesive representation. Because of its generalizability, the MV design has been widely adopted by the visualization community to help users examine and interact with large, complex, and high-dimensional data. However, although ubiquitous, there has been little work to categorize and analyze MVs in order to better understand its design space. As a result, there has been little to no guideline in how to use the MV design effectively. In this paper, we present an in-depth study of how MVs are designed in practice. We focus on two fundamental measures of multiple-view patterns: composition, which quantifies what view types and how many are there; and configuration, which characterizes spatial arrangement of view layouts in the display space. We build a new dataset containing 360 images of MVs collected from IEEE VIS, EuroVis, and PacificVis publications 2011 to 2019, and make fine-grained annotations of view types and layouts for these visualization images. From this data we conduct composition and configuration analyses using quantitative metrics of term frequency and layout topology. We identify common practices around MVs, including relationship of view types, popular view layouts, and correlation between view types and layouts. We combine the findings into a MV recommendation system, providing interactive tools to explore the design space, and support example-based design.

preprint2020arXiv

Powderday: Dust Radiative Transfer for Galaxy Simulations

We present Powderday, a flexible, fast, open-source dust radiative transfer package designed to interface with galaxy formation simulations. Powderday builds on FSPS population synthesis models, Hyperion dust radiative transfer, and employs yt to interface between different software packages. We include our stellar population synthesis modeling on the fly, which allows for significant run-time flexibility in the assumed stellar physics. We include a model for nebular line emission that can employ either precomputed Cloudy lookup tables (for efficiency), or direct photoionization calculations for all young stars (for flexibility). The dust content follows either observationally-motivated prescriptions, direct modeling from galaxy formation simulations, or a novel approach that includes the dust content via learning-based algorithms from the SIMBA cosmological galaxy formation simulation. AGN can additionally be included via a range of prescriptions. The output of these models are broadband SEDs, as well as filter-convolved images. Powderday is designed to eliminate last-mile efforts by researchers that employ different hydrodynamic galaxy formation models, and seamlessly interfaces with GIZMO, AREPO, GASOLINE, CHANGA, and ENZO. We demonstrate the capabilities of the code via three applications: a model for the star formation rate (SFR) - infrared luminosity relation in galaxies (including the impact of AGN); the impact of circumstellar dust around AGB stars on the mid-infrared emission from galaxy SEDs; and the impact of galaxy inclination angle on dust attenuation laws.