Researcher profile

James Oldfield

James Oldfield contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery

Many features in pretrained Transformers span multiple layers: they emerge through stages of inference, persist in the residual stream, or are built jointly by parallel MLPs. Crosscoders (namely, sparse dictionaries trained jointly across layers) aim to recover these cross-layer features in a single shared latent space. We show that standard crosscoders largely fail at this purpose. Although their decoder weight norms spread evenly across layers, a functional coherence metric we introduce reveals that each latent's activation is effectively driven by only one or two layers on average. While functionally coherent latents act as human-interpretable concept detectors (e.g., US states and cities), the layer-localized latents that crosscoders predominantly learn collapse onto surface-level patterns such as digit detectors. We trace this failure to two structural limitations: unconstrained cross-layer parameterization and unregularized cross-layer dependence. We address both by introducing fmxcoders, which (i) replace the encoder and decoder with low-rank tensor factorizations that draw every latent's per-layer weights from a shared cross-layer basis, and (ii) apply stochastic layer masking, a denoising regularizer along the layer axis that penalizes latents whose contribution collapses when a single layer is masked. Across GPT2-Small, Pythia-410M, Pythia-1.4B, and Gemma2-2B, fmxcoders lift mean probing F1 by 10-30 points, surpassing per-layer SAE baselines that standard crosscoders fail to reach, reduce reconstruction MSE by 25-50%, and roughly double mean functional coherence. An LLM-as-a-judge evaluation further shows that fmxcoders recover 3-13$\times$ more semantically coherent latents than standard crosscoders across all four base LLMs.

preprint2026arXiv

Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

Multilayer perceptrons (MLPs) are an integral part of large language models, yet their dense representations render them difficult to understand, edit, and steer. Recent methods learn interpretable approximations via neuron-level sparsity, yet fail to faithfully reconstruct the original mapping--significantly increasing model's next-token cross-entropy loss. In this paper, we advocate for moving to layer-level sparsity to overcome the accuracy trade-off in sparse layer approximation. Under this paradigm, we introduce Mixture of Decoders (MxDs). MxDs generalize MLPs and Gated Linear Units, expanding pre-trained dense layers into tens of thousands of specialized sublayers. Through a flexible form of tensor factorization, each sparsely activating MxD sublayer implements a linear transformation with full-rank weights--preserving the original decoders' expressive capacity even under heavy sparsity. Experimentally, we show that MxDs significantly outperform state-of-the-art methods (e.g., Transcoders) on the sparsity-accuracy frontier in language models with up to 3B parameters. Further evaluations on sparse probing and feature steering demonstrate that MxDs learn similarly specialized features of natural language--opening up a promising new avenue for designing interpretable yet faithful decompositions. Our code is included at: https://github.com/james-oldfield/MxD/.

preprint2022arXiv

ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences

This work addresses the problem of discovering non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner. In the proposed method, the discovery is driven by a set of pairs of natural language sentences with contrasting semantics, named semantic dipoles, that serve as the limits of the interpretation that we require by the trainable latent paths to encode. By using the pre-trained CLIP encoder, the sentences are projected into the vision-language space, where they serve as dipoles, and where RBF-based warping functions define a set of non-linear directional paths, one for each semantic dipole, allowing in this way traversals from one semantic pole to the other. By defining an objective that discovers paths in the latent space of GANs that generate changes along the desired paths in the vision-language embedding space, we provide an intuitive way of controlling the underlying generative factors and address some of the limitations of the state-of-the-art works, namely, that a) they are typically tailored to specific GAN architectures (i.e., StyleGAN), b) they disregard the relative position of the manipulated and the original image in the image embedding and the relative position of the image and the text embeddings, and c) they lead to abrupt image manipulations and quickly arrive at regions of low density and, thus, low image quality, providing limited control of the generative factors. We provide extensive qualitative and quantitative results that demonstrate our claims with two pre-trained GANs, and make the code and the pre-trained models publicly available at: https://github.com/chi0tzp/ContraCLIP

preprint2021arXiv

Adversarial Learning of Disentangled and Generalizable Representations for Visual Attributes

Recently, a multitude of methods for image-to-image translation have demonstrated impressive results on problems such as multi-domain or multi-attribute transfer. The vast majority of such works leverages the strengths of adversarial learning and deep convolutional autoencoders to achieve realistic results by well-capturing the target data distribution. Nevertheless, the most prominent representatives of this class of methods do not facilitate semantic structure in the latent space, and usually rely on binary domain labels for test-time transfer. This leads to rigid models, unable to capture the variance of each domain label. In this light, we propose a novel adversarial learning method that (i) facilitates the emergence of latent structure by semantically disentangling sources of variation, and (ii) encourages learning generalizable, continuous, and transferable latent codes that enable flexible attribute mixing. This is achieved by introducing a novel loss function that encourages representations to result in uniformly distributed class posteriors for disentangled attributes. In tandem with an algorithm for inducing generalizable properties, the resulting representations can be utilized for a variety of tasks such as intensity-preserving multi-attribute image translation and synthesis, without requiring labelled test data. We demonstrate the merits of the proposed method by a set of qualitative and quantitative experiments on popular databases such as MultiPIE, RaFD, and BU-3DFE, where our method outperforms other, state-of-the-art methods in tasks such as intensity-preserving multi-attribute transfer and synthesis.

preprint2020arXiv

Enhancing Facial Data Diversity with Style-based Face Aging

A significant limiting factor in training fair classifiers relates to the presence of dataset bias. In particular, face datasets are typically biased in terms of attributes such as gender, age, and race. If not mitigated, bias leads to algorithms that exhibit unfair behaviour towards such groups. In this work, we address the problem of increasing the diversity of face datasets with respect to age. Concretely, we propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns by conditioning on multi-resolution age-discriminative representations. By evaluating on several age-annotated datasets in both single- and cross-database experiments, we show that the proposed method outperforms state-of-the-art algorithms for age transfer, especially in the case of age groups that lie in the tails of the label distribution. We further show significantly increased diversity in the augmented datasets, outperforming all compared methods according to established metrics.