Source author record

Christopher Yau

Christopher Yau appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation Methodology Applications Artificial Intelligence Computer Vision eess.SP Genomics

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Martingale-Consistent Self-Supervised Learning

Self-supervised learning (SSL) is often deployed under changing information, such as shorter histories, missing features, or partially observed images. In these settings, predictions from coarse and refined views should be coherent: before refinement, the coarse-view prediction should match the average prediction expected after refinement. Martingales formalize this coherence principle, but standard SSL objectives do not enforce it. Unlike invariance objectives that pull views together, martingale consistency constrains only the expected refined prediction, allowing predictions to update as information is revealed while preventing systematic drift. We introduce a martingale-consistent SSL framework that closes this gap, with practical prediction- and latent-space variants and an unbiased two-sample Monte Carlo estimator based on stochastic refinement. We evaluate the approach on synthetic and real time-series, tabular, and image benchmarks under partial-observation regimes, in both semi-self-supervised and fully label-free settings. Across these experiments, our framework improves robustness and calibration under partial observation, yielding more stable representations as information is revealed.

preprint2023arXiv

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.

preprint2020arXiv

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction. However, in application domains such as genomics where data sets are typically tabular and high-dimensional, a black-box approach to dimensionality reduction does not provide sufficient insights. Common data analysis workflows additionally use clustering techniques to identify groups of similar features. This usually leads to a two-stage process, however, it would be desirable to construct a joint modelling framework for simultaneous dimensionality reduction and clustering of features. In this paper, we propose to achieve this through the BasisVAE: a combination of the VAE and a probabilistic clustering prior, which lets us learn a one-hot basis function representation as part of the decoder network. Furthermore, for scenarios where not all features are aligned, we develop an extension to handle translation-invariant basis functions. We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE, demonstrated on various toy examples as well as on single-cell gene expression data.

preprint2020arXiv

Neural Decomposition: Functional ANOVA with Variational Autoencoders

Variational Autoencoders (VAEs) have become a popular approach for dimensionality reduction. However, despite their ability to identify latent low-dimensional structures embedded within high-dimensional data, these latent representations are typically hard to interpret on their own. Due to the black-box nature of VAEs, their utility for healthcare and genomics applications has been limited. In this paper, we focus on characterising the sources of variation in Conditional VAEs. Our goal is to provide a feature-level variance decomposition, i.e. to decompose variation in the data by separating out the marginal additive effects of latent variables z and fixed inputs c from their non-linear interactions. We propose to achieve this through what we call Neural Decomposition - an adaptation of the well-known concept of functional ANOVA variance decomposition from classical statistics to deep learning models. We show how identifiability can be achieved by training models subject to constraints on the marginal properties of the decoder networks. We demonstrate the utility of our Neural Decomposition on a series of synthetic examples as well as high-dimensional genomics data.

preprint2015arXiv

The Hamming Ball Sampler

We introduce the Hamming Ball Sampler, a novel Markov Chain Monte Carlo algorithm, for efficient inference in statistical models involving high-dimensional discrete state spaces. The sampling scheme uses an auxiliary variable construction that adaptively truncates the model space allowing iterative exploration of the full model space in polynomial time. The approach generalizes conventional Gibbs sampling schemes for discrete spaces and can be considered as a Big Data-enabled MCMC algorithm that provides an intuitive means for user-controlled balance between statistical efficiency and computational tractability. We illustrate the generic utility of our sampling algorithm through application to a range of statistical models.

preprint2013arXiv

A decision-theoretic approach for segmental classification

This paper is concerned with statistical methods for the segmental classification of linear sequence data where the task is to segment and classify the data according to an underlying hidden discrete state sequence. Such analysis is commonplace in the empirical sciences including genomics, finance and speech processing. In particular, we are interested in answering the following question: given data $y$ and a statistical model $π(x,y)$ of the hidden states $x$, what should we report as the prediction $\hat{x}$ under the posterior distribution $π(x|y)$? That is, how should you make a prediction of the underlying states? We demonstrate that traditional approaches such as reporting the most probable state sequence or most probable set of marginal predictions can give undesirable classification artefacts and offer limited control over the properties of the prediction. We propose a decision theoretic approach using a novel class of Markov loss functions and report $\hat{x}$ via the principle of minimum expected loss (maximum expected utility). We demonstrate that the sequence of minimum expected loss under the Markov loss function can be enumerated exactly using dynamic programming methods and that it offers flexibility and performance improvements over existing techniques. The result is generic and applicable to any probabilistic model on a sequence, such as Hidden Markov models, change point or product partition models.

preprint2013arXiv

Statistical Inference in Hidden Markov Models using $k$-segment Constraints

Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward (F-B) algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming algorithms that, we collectively call $k$-segment algorithms, that allow us to i) find MAP sequences, ii) compute posterior probabilities and iii) simulate sample paths conditional on a user specified number of segments, i.e. contiguous runs in a hidden state, possibly of a particular type. We illustrate the utility of these methods using simulated and real examples and highlight the application of prospective and retrospective use of these methods for fitting HMMs or exploring existing model fits.

preprint2013arXiv

The Alive Particle Filter

In the following article we develop a particle filter for approximating Feynman-Kac models with indicator potentials. Examples of such models include approximate Bayesian computation (ABC) posteriors associated with hidden Markov models (HMMs) or rare-event problems. Such models require the use of advanced particle filter or Markov chain Monte Carlo (MCMC) algorithms e.g. Jasra et al. (2012), to perform estimation. One of the drawbacks of existing particle filters, is that they may 'collapse', in that the algorithm may terminate early, due to the indicator potentials. In this article, using a special case of the locally adaptive particle filter in Lee et al. (2013), which is closely related to Le Gland & Oudjane (2004), we use an algorithm which can deal with this latter problem, whilst introducing a random cost per-time step. This algorithm is investigated from a theoretical perspective and several results are given which help to validate the algorithms and to provide guidelines for their implementation. In addition, we show how this algorithm can be used within MCMC, using particle MCMC (Andrieu et al. 2010). Numerical examples are presented for ABC approximations of HMMs.

preprint2009arXiv

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers. For certain classes of Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

Christopher Yau

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Martingale-Consistent Self-Supervised Learning

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

Neural Decomposition: Functional ANOVA with Variational Autoencoders

The Hamming Ball Sampler

A decision-theoretic approach for segmental classification

Statistical Inference in Hidden Markov Models using $k$-segment Constraints

The Alive Particle Filter

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods