Source author record

Lucas Dixon

Lucas Dixon appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Logic in Computer Science Information Retrieval Machine Learning Symbolic Computation Cryptography and Security cs.CY Human-Computer Interaction math.CT Mathematical Software Networking and Internet Architecture

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

Knowledge Distillation (KD) is a critical tool for training Large Language Models (LLMs), yet the majority of research focuses on approaches that rely solely on output logits, neglecting semantic information in the teacher's intermediate representations. While Hidden Layer Distillation (HLD) showed potential for encoder architectures, its application to decoder-only pre-training at scale remains largely unexplored. Through compute-controlled experiments, we benchmark HLD against logit-based KD and self-supervised baselines with Gemma3 3.4B as teacher and 123M and 735M students trained on up to 168B tokens from the C4 dataset. Our experiments show that HLD does not consistently outperform standard KD on downstream evaluation tasks. Nevertheless, we show that HLD can yield a systematic perplexity gain over KD across all shared-hyperparameter configurations, suggesting that a latent signal can be extracted, but a breakthrough may be needed for it to play a more significant role in LLM pre-training.

preprint2026arXiv

Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

Artificial intelligence offers powerful new tools for scientific discovery, but the interaction paradigms required to effectively harness these systems remain underexplored. In this paper, we present findings from a formative user study with 11 expert mathematicians who used AlphaEvolve, an evolutionary coding agent, to tackle advanced problems in their fields of expertise. We identify and characterize a distinct workflow we term intentmaking, the iterative process of discovering, defining, and refining one's experimental goals through active system interaction. We frame this as a natural extension to sensemaking, the cognitive process of building an understanding of complex or novel data. We suggest that users enter a cycle of intentmaking (defining and updating their experiment) and sensemaking (interpreting the results) which repeats many times during the course of an investigation. Our documentation of these themes suggests an approach to designing AI tools for scientific discovery that goes beyond the existing question/answer model of many current systems, treating them as collaborative instruments rather than opaque black-box assistants.

preprint2022arXiv

Basic Elements of Logical Graphs

We considers how a particular kind of graph corresponds to multiplicative intuitionistic linear logic formula. The main feature of the graphical notation is that it absorbs certain symmetries between conjunction and implication. We look at the basic definitions and present details of an implementation in the functional programming language Standard ML. This provides a functional approach to graph traversal and demonstrates how graph isomorphism be implemented in just a few lines of readable code. This works takes the initial steps towards a graphical language and toolkit for working with logic formula and derivations.

preprint2022arXiv

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GLaM (Generalist Language Model), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. The largest GLaM has 1.2 trillion parameters, which is approximately 7x larger than GPT-3. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.

preprint2022arXiv

On Natural Language User Profiles for Transparent and Scrutable Recommendation

Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests.

preprint2021arXiv

Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.

preprint2020arXiv

Classifying Constructive Comments

We introduce the Constructive Comments Corpus (C3), comprised of 12,000 annotated news comments, intended to help build new tools for online communities to improve the quality of their discussions. We define constructive comments as high-quality comments that make a contribution to the conversation. We explain the crowd worker annotation scheme and define a taxonomy of sub-characteristics of constructiveness. The quality of the annotation scheme and the resulting dataset is evaluated using measurements of inter-annotator agreement, expert assessment of a sample, and by the constructiveness sub-characteristics, which we show provide a proxy for the general constructiveness concept. We provide models for constructiveness trained on C3 using both feature-based and a variety of deep learning approaches and demonstrate that these models capture general rather than topic- or domain-specific characteristics of constructiveness, through domain adaptation experiments. We examine the role that length plays in our models, as comment length could be easily gamed if models depend heavily upon this feature. By examining the errors made by each model and their distribution by length, we show that the best performing models are less correlated with comment length.The constructiveness corpus and our experiments pave the way for a moderation tool focused on promoting comments that make a contribution, rather than only filtering out undesirable content.

preprint2020arXiv

Toxicity Detection: Does Context Really Matter?

Moderation is crucial to promoting healthy on-line discussions. Although several `toxicity' detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments maybe judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improve performance of toxicity detection systems? We experiment with Wikipedia conversations, limiting the notion of context to the previous post in the thread and the discussion title. We find that context can both amplify or mitigate the perceived toxicity of posts. Moreover, a small but significant subset of manually labeled posts (5% in one of our experiments) end up having the opposite toxicity labels if the annotators are not provided with context. Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers, having tried a range of classifiers and mechanisms to make them context aware. This points to the need for larger datasets of comments annotated in context. We make our code and data publicly available.

preprint2016arXiv

Network Traffic Obfuscation and Automated Internet Censorship

Internet censors seek ways to identify and block internet access to information they deem objectionable. Increasingly, censors deploy advanced networking tools such as deep-packet inspection (DPI) to identify such connections. In response, activists and academic researchers have developed and deployed network traffic obfuscation mechanisms. These apply specialized cryptographic tools to attempt to hide from DPI the true nature and content of connections. In this survey, we give an overview of network traffic obfuscation and its role in circumventing Internet censorship. We provide historical and technical background that motivates the need for obfuscation tools, and give an overview of approaches to obfuscation used by state of the art tools. We discuss the latest research on how censors might detect these efforts. We also describe the current challenges to censorship circumvention research and identify concrete ways for the community to address these challenges.

preprint2010arXiv

Open Graphs and Computational Reasoning

We present a form of algebraic reasoning for computational objects which are expressed as graphs. Edges describe the flow of data between primitive operations which are represented by vertices. These graphs have an interface made of half-edges (edges which are drawn with an unconnected end) and enjoy rich compositional principles by connecting graphs along these half-edges. In particular, this allows equations and rewrite rules to be specified between graphs. Particular computational models can then be encoded as an axiomatic set of such rules. Further rules can be derived graphically and rewriting can be used to simulate the dynamics of a computational system, e.g. evaluating a program on an input. Examples of models which can be formalised in this way include traditional electronic circuits as well as recent categorical accounts of quantum information.

preprint2010arXiv

Open Graphs and Monoidal Theories

String diagrams are a powerful tool for reasoning about physical processes, logic circuits, tensor networks, and many other compositional structures. The distinguishing feature of these diagrams is that edges need not be connected to vertices at both ends, and these unconnected ends can be interpreted as the inputs and outputs of a diagram. In this paper, we give a concrete construction for string diagrams using a special kind of typed graph called an open-graph. While the category of open-graphs is not itself adhesive, we introduce the notion of a selective adhesive functor, and show that such a functor embeds the category of open-graphs into the ambient adhesive category of typed graphs. Using this functor, the category of open-graphs inherits "enough adhesivity" from the category of typed graphs to perform double-pushout (DPO) graph rewriting. A salient feature of our theory is that it ensures rewrite systems are "type-safe" in the sense that rewriting respects the inputs and outputs. This formalism lets us safely encode the interesting structure of a computational model, such as evaluation dynamics, with succinct, explicit rewrite rules, while the graphical representation absorbs many of the tedious details. Although topological formalisms exist for string diagrams, our construction is discreet, finitary, and enjoys decidable algorithms for composition and rewriting. We also show how open-graphs can be parametrised by graphical signatures, similar to the monoidal signatures of Joyal and Street, which define types for vertices in the diagrammatic language and constraints on how they can be connected. Using typed open-graphs, we can construct free symmetric monoidal categories, PROPs, and more general monoidal theories. Thus open-graphs give us a handle for mechanised reasoning in monoidal categories.

Lucas Dixon

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

Basic Elements of Logical Graphs

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

On Natural Language User Profiles for Transparent and Scrutable Recommendation

Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

Classifying Constructive Comments

Toxicity Detection: Does Context Really Matter?

Network Traffic Obfuscation and Automated Internet Censorship

Open Graphs and Computational Reasoning

Open Graphs and Monoidal Theories