Source author record

Nick Whiteley

Nick Whiteley appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Methodology Machine Learning math.PR math.ST Statistics Theory Applications Neurons and Cognition

Catalog footprint

What is connected

25works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of \textit{labelled} data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a metric for comparing conditional distributions, and derive a closed form estimator. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from cubic to linear. Leveraging this, we extend KH to propose Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm for constructing compressed sets that target the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), an adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we also propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.

preprint2022arXiv

An invitation to sequential Monte Carlo samplers

Statisticians often use Monte Carlo methods to approximate probability distributions, primarily with Markov chain Monte Carlo and importance sampling. Sequential Monte Carlo samplers are a class of algorithms that combine both techniques to approximate distributions of interest and their normalizing constants. These samplers originate from particle filtering for state space models and have become general and scalable sampling techniques. This article describes sequential Monte Carlo samplers and their possible implementations, arguing that they remain under-used in statistics, despite their ability to perform sequential inference and to leverage parallel processing resources among other potential benefits.

preprint2022arXiv

Exploiting locality in high-dimensional factorial hidden Markov models

We propose algorithms for approximate filtering and smoothing in high-dimensional Factorial hidden Markov models. The approximation involves discarding, in a principled way, likelihood factors according to a notion of locality in a factor graph associated with the emission distribution. This allows the exponential-in-dimension cost of exact filtering and smoothing to be avoided. We prove that the approximation accuracy, measured in a local total variation norm, is "dimension-free" in the sense that as the overall dimension of the model increases the error bounds we derive do not necessarily degrade. A key step in the analysis is to quantify the error introduced by localizing the likelihood function in a Bayes' rule update. The factorial structure of the likelihood function which we exploit arises naturally when data have known spatial or network structure. We demonstrate the new algorithms on synthetic examples and a London Underground passenger flow problem, where the factor graph is effectively given by the train network.

preprint2021arXiv

Dimension-free Wasserstein contraction of nonlinear filters

For a class of partially observed diffusions, conditions are given for the map from the initial condition of the signal to filtering distribution to be contractive with respect to Wasserstein distances, with rate which does not necessarily depend on the dimension of the state-space. The main assumptions are that the signal has affine drift and constant diffusion coefficient and that the likelihood functions are log-concave. Ergodic and nonergodic signals are handled in a single framework. Examples include linear-Gaussian, stochastic volatility, neural spike-train and dynamic generalized linear models. For these examples filter stability can be established without any assumptions on the observations.

preprint2021arXiv

Inference in Stochastic Epidemic Models via Multinomial Approximations

We introduce a new method for inference in stochastic epidemic models which uses recursive multinomial approximations to integrate over unobserved variables and thus circumvent likelihood intractability. The method is applicable to a class of discrete-time, finite-population compartmental models with partial, randomly under-reported or missing count observations. In contrast to state-of-the-art alternatives such as Approximate Bayesian Computation techniques, no forward simulation of the model is required and there are no tuning parameters. Evaluating the approximate marginal likelihood of model parameters is achieved through a computationally simple filtering recursion. The accuracy of the approximation is demonstrated through analysis of real and simulated data using a model of the 1995 Ebola outbreak in the Democratic Republic of Congo. We show how the method can be embedded within a Sequential Monte Carlo approach to estimating the time-varying reproduction number of COVID-19 in Wuhan, China, recently published by Kucharski et al. 2020.

preprint2020arXiv

Global consensus Monte Carlo

To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with each term, which are conditionally independent given the top-level parameters. One of these top-level parameters controls the unconditional strength of association between the auxiliary parameters. This model leads to a distributed MCMC algorithm on an extended state space yielding approximations of posterior expectations. A trade-off between computational tractability and fidelity to the original model can be controlled by changing the association strength in the instrumental model. We further propose the use of a SMC sampler with a sequence of association strengths, allowing both the automatic determination of appropriate strengths and for a bias correction technique to be applied. In contrast to similar distributed Monte Carlo algorithms, this approach requires few distributional assumptions. The performance of the algorithms is illustrated with a number of simulated examples.

preprint2020arXiv

Negative association, ordering and convergence of resampling methods

We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost-sure weak convergence of measures output from Kitagawa's (1996) stratified resampling method. Carpenter et al's (1999) systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of Srinivasan (2001), which shares some attractive properties of systematic resampling, but which exhibits negative association and therefore converges irrespective of the order of the input samples. We confirm a conjecture made by Kitagawa (1996) that ordering input samples by their states in $\mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $\mathbb{R}^d$, the variance of the resampling error is ${\scriptscriptstyle\mathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.

preprint2016arXiv

An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter

We propose a new algorithm for approximating the non-asymptotic second moment of the marginal likelihood estimate, or normalizing constant, provided by a particle filter. The computational cost of the new method is $O(M)$ per time step, independently of the number of particles $N$ in the particle filter, where $M$ is a parameter controlling the quality of the approximation. This is in contrast to $O(MN)$ for a simple averaging technique using $M$ i.i.d. replicates of a particle filter with $N$ particles. We establish that the approximation delivered by the new algorithm is unbiased, strongly consistent and, under standard regularity conditions, increasing $M$ linearly with time is sufficient to prevent growth of the relative variance of the approximation, whereas for the simple averaging technique it can be necessary to increase $M$ exponentially with time in order to achieve the same effect. Numerical examples illustrate performance in the context of a stochastic Lotka\textendash Volterra system and a simple AR(1) model.

preprint2016arXiv

Calculating principal eigen-functions of non-negative integral kernels: particle approximations and applications

Often in applications such as rare events estimation or optimal control it is required that one calculates the principal eigen-function and eigen-value of a non-negative integral kernel. Except in the finite-dimensional case, usually neither the principal eigen-function nor the eigen-value can be computed exactly. In this paper, we develop numerical approximations for these quantities. We show how a generic interacting particle algorithm can be used to deliver numerical approximations of the eigen-quantities and the associated so-called "twisted" Markov kernel as well as how these approximations are relevant to the aforementioned applications. In addition, we study a collection of random integral operators underlying the algorithm, address some of their mean and path-wise properties, and obtain $L_{r}$ error estimates. Finally, numerical examples are provided in the context of importance sampling for computing tail probabilities of Markov chains and computing value functions for a class of stochastic optimal control problems.

preprint2016arXiv

Fluctuations, stability and instability of a distributed particle filter with local exchange

We study a distributed particle filter proposed by Bolić et al.~(2005). This algorithm involves $m$ groups of $M$ particles, with interaction between groups occurring through a "local exchange" mechanism. We establish a central limit theorem in the regime where $M$ is fixed and $m\to\infty$. A formula we obtain for the asymptotic variance can be interpreted in terms of colliding Markov chains, enabling analytic and numerical evaluations of how the asymptotic variance behaves over time, with comparison to a benchmark algorithm consisting of $m$ independent particle filters. We prove that subject to regularity conditions, when $m$ is fixed both algorithms converge time-uniformly at rate $M^{-1/2}$. Through use of our asymptotic variance formula we give counter-examples satisfying the same regularity conditions to show that when $M$ is fixed neither algorithm, in general, converges time-uniformly at rate $m^{-1/2}$.

preprint2016arXiv

On the role of interaction in sequential Monte Carlo algorithms

We introduce a general form of sequential Monte Carlo algorithm defined in terms of a parameterized resampling mechanism. We find that a suitably generalized notion of the Effective Sample Size (ESS), widely used to monitor algorithm degeneracy, appears naturally in a study of its convergence properties. We are then able to phrase sufficient conditions for time-uniform convergence in terms of algorithmic control of the ESS, in turn achievable by adaptively modulating the interaction between particles. This leads us to suggest novel algorithms which are, in senses to be made precise, provably stable and yet designed to avoid the degree of interaction which hinders parallelization of standard algorithms. As a byproduct, we prove time-uniform convergence of the popular adaptive resampling particle filter.

preprint2016arXiv

Perfect sampling for nonhomogeneous Markov chains and hidden Markov models

We obtain a perfect sampling characterization of weak ergodicity for backward products of finite stochastic matrices, and equivalently, simultaneous tail triviality of the corresponding nonhomogeneous Markov chains. Applying these ideas to hidden Markov models, we show how to sample exactly from the finite-dimensional conditional distributions of the signal process given infinitely many observations, using an algorithm which requires only an almost surely finite number of observations to actually be accessed. A notion of "successful" coupling is introduced and its occurrence is characterized in terms of conditional ergodicity properties of the hidden Markov model and related to the stability of nonlinear filters.

preprint2016arXiv

Variance estimation in the particle filter

This paper concerns numerical assessment of Monte Carlo error in particle filters. We show that by keeping track of certain key features of the genealogical structure arising from resampling operations, it is possible to estimate variances of a number of standard Monte Carlo approximations which particle filters deliver. All our estimators can be computed from a single run of a particle filter with no further simulation. We establish that as the number of particles grows, our estimators are weakly consistent for asymptotic variances of the Monte Carlo approximations and some of them are also non-asymptotically unbiased. The asymptotic variances can be decomposed into terms corresponding to each time step of the algorithm, and we show how to consistently estimate each of these terms. When the number of particles may vary over time, this allows approximation of the asymptotically optimal allocation of particle numbers.

preprint2015arXiv

Stability with respect to initial conditions in V-norm for nonlinear filters with ergodic observations

We establish conditions for an exponential rate of forgetting of the initial distribution of nonlinear filters in $V$-norm, path-wise along almost all observation sequences. In contrast to previous works, our results allow for unbounded test functions. The analysis is conducted in an general setup involving nonnegative kernels in a random environment which allows treatment of filters and prediction filters in a single framework. The main result is illustrated on two examples, the first showing that a total variation norm stability result obtained by Douc et al. (2009) can be extended to $V$-norm without any additional assumptions, the second concerning a situation in which forgetting of the initial condition holds in $V$-norm for the filters, but the $V$-norm of each prediction filter is infinite.

preprint2014arXiv

A hidden Markov model for decoding and the analysis of replay in spike trains

We present a hidden Markov model that describes variation in an animal's position associated with varying levels of activity in action potential spike trains of individual place cell neurons. The model incorporates a coarse-graining of position, which we find to be a more parsimonious description of the system than other models. We use a sequential Monte Carlo algorithm for Bayesian inference of model parameters, including the state space dimension, and we explain how to estimate position from spike train observations (decoding). We obtain greater accuracy over other methods in the conditions of high temporal resolution and small neuronal sample size. We also present a novel, model-based approach to the study of replay: the expression of spike train activity related to behaviour during times of motionlessness or sleep, thought to be integral to the consolidation of long-term memories. We demonstrate how we can detect the time, information content and compression rate of replay events in simulated and real hippocampal data recorded from rats in two different environments, and verify the correlation between the times of detected replay events and of sharp wave/ripples in the local field potential.

preprint2014arXiv

Butterfly resampling: asymptotics for particle filters with constrained interactions

We generalize the elementary mechanism of sampling with replacement $N$ times from a weighted population of size $N$, by introducing auxiliary variables and constraints on conditional independence characterised by modular congruence relations. Motivated by considerations of parallelism, a convergence study reveals how sparsity of the mechanism's conditional independence graph is related to fluctuation properties of particle filters which use it for resampling, in some cases exhibiting exotic scaling behaviour. The proofs involve detailed combinatorial analysis of conditional independence graphs.

preprint2014arXiv

Forest resampling for distributed sequential Monte Carlo

This paper brings explicit considerations of distributed computing architectures and data structures into the rigorous design of Sequential Monte Carlo (SMC) methods. A theoretical result established recently by the authors shows that adapting interaction between particles to suitably control the Effective Sample Size (ESS) is sufficient to guarantee stability of SMC algorithms. Our objective is to leverage this result and devise algorithms which are thus guaranteed to work well in a distributed setting. We make three main contributions to achieve this. Firstly, we study mathematical properties of the ESS as a function of matrices and graphs that parameterize the interaction amongst particles. Secondly, we show how these graphs can be induced by tree data structures which model the logical network topology of an abstract distributed computing environment. Thirdly, we present efficient distributed algorithms that achieve the desired ESS control, perform resampling and operate on forests associated with these trees.

preprint2014arXiv

Twisted particle filters

We investigate sampling laws for particle algorithms and the influence of these laws on the efficiency of particle approximations of marginal likelihoods in hidden Markov models. Among a broad class of candidates we characterize the essentially unique family of particle system transition kernels which is optimal with respect to an asymptotic-in-time variance growth rate criterion. The sampling structure of the algorithm defined by these optimal transitions turns out to be only subtly different from standard algorithms and yet the fluctuation properties of the estimates it provides can be dramatically different. The structure of the optimal transition suggests a new class of algorithms, which we term "twisted" particle filters and which we validate with asymptotic analysis of a more traditional nature, in the regime where the number of particles tends to infinity.

preprint2013arXiv

Stability properties of some particle filters

Under multiplicative drift and other regularity conditions, it is established that the asymptotic variance associated with a particle filter approximation of the prediction filter is bounded uniformly in time, and the nonasymptotic, relative variance associated with a particle approximation of the normalizing constant is bounded linearly in time. The conditions are demonstrated to hold for some hidden Markov models on noncompact state spaces. The particle stability results are obtained by proving $v$-norm multiplicative stability and exponential moment results for the underlying Feynman-Kac formulas.

preprint2012arXiv

Approximate Bayesian Computation for Smoothing

We consider a method for approximate inference in hidden Markov models (HMMs). The method circumvents the need to evaluate conditional densities of observations given the hidden states. It may be considered an instance of Approximate Bayesian Computation (ABC) and it involves the introduction of auxiliary variables valued in the same space as the observations. The quality of the approximation may be controlled to arbitrary precision through a parameter ε>0 . We provide theoretical results which quantify, in terms of ε, the ABC error in approximation of expectations of additive functionals with respect to the smoothing distributions. Under regularity assumptions, this error is O(nε), where n is the number of time steps over which smoothing is performed. For numerical implementation we adopt the forward-only sequential Monte Carlo (SMC) scheme of [16] and quantify the combined error from the ABC and SMC approximations. This forms some of the first quantitative results for ABC methods which jointly treat the ABC and simulation errors, with a finite number of data and simulated samples. When the HMM has unknown static parameters, we consider particle Markov chain Monte Carlo [2] (PMCMC) methods for batch statistical inference.

preprint2012arXiv

Bayesian learning of noisy Markov decision processes

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

preprint2012arXiv

Linear Variance Bounds for Particle Approximations of Time-Homogeneous Feynman-Kac Formulae

This article establishes sufficient conditions for a linear-in-time bound on the non-asymptotic variance of particle approximations of time-homogeneous Feynman-Kac formulae. These formulae appear in a wide variety of applications including option pricing in finance and risk sensitive control in engineering. In direct Monte Carlo approximation of these formulae, the non-asymptotic variance typically increases at an exponential rate in the time parameter. It is shown that a linear bound holds when a non-negative kernel, defined by the logarithmic potential function and Markov kernel which specify the Feynman-Kac model, satisfies a type of multiplicative drift condition and other regularity assumptions. Examples illustrate that these conditions are general and flexible enough to accommodate two rather extreme cases, which can occur in the context of a non-compact state space: 1) when the potential function is bounded above, not bounded below and the Markov kernel is not ergodic; and 2) when the potential function is not bounded above, but the Markov kernel itself satisfies a multiplicative drift condition.

preprint2011arXiv

Error Bounds and Normalizing Constants for Sequential Monte Carlo in High Dimensions

In a recent paper Beskos et al (2011), the Sequential Monte Carlo (SMC) sampler introduced in Del Moral et al (2006), Neal (2001) has been shown to be asymptotically stable in the dimension of the state space d at a cost that is only polynomial in d, when N the number of Monte Carlo samples, is fixed. More precisely, it has been established that the effective sample size (ESS) of the ensuing (approximate) sample and the Monte Carlo error of fixed dimensional marginals will converge as $d$ grows, with a computational cost of $\mathcal{O}(Nd^2)$. In the present work, further results on SMC methods in high dimensions are provided as $d\to\infty$ and with $N$ fixed. We deduce an explicit bound on the Monte-Carlo error for estimates derived using the SMC sampler and the exact asymptotic relative $\mathbb{L}_2$-error of the estimate of the normalizing constant. We also establish marginal propagation of chaos properties of the algorithm. The accuracy in high-dimensions of some approximate SMC-based filtering schemes is also discussed.

preprint2011arXiv

Sequential Monte Carlo samplers: error bounds and insensitivity to initial conditions

This paper addresses finite sample stability properties of sequential Monte Carlo methods for approximating sequences of probability distributions. The results presented herein are applicable in the scenario where the start and end distributions in the sequence are fixed and the number of intermediate steps is a parameter of the algorithm. Under assumptions which hold on non-compact spaces, it is shown that the effect of the initial distribution decays exponentially fast in the number of intermediate steps and the corresponding stochastic error is stable in \mathbb{L}_{p} norm.

preprint2010arXiv

Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo Methods

Switching state-space models (SSSM) are a very popular class of time series models that have found many applications in statistics, econometrics and advanced signal processing. Bayesian inference for these models typically relies on Markov chain Monte Carlo (MCMC) techniques. However, even sophisticated MCMC methods dedicated to SSSM can prove quite inefficient as they update potentially strongly correlated discrete-valued latent variables one-at-a-time (Carter and Kohn, 1996; Gerlach et al., 2000; Giordani and Kohn, 2008). Particle Markov chain Monte Carlo (PMCMC) methods are a recently developed class of MCMC algorithms which use particle filters to build efficient proposal distributions in high-dimensions (Andrieu et al., 2010). The existing PMCMC methods of Andrieu et al. (2010) are applicable to SSSM, but are restricted to employing standard particle filtering techniques. Yet, in the context of discrete-valued latent variables, specialised particle techniques have been developed which can outperform by up to an order of magnitude standard methods (Fearnhead, 1998; Fearnhead and Clifford, 2003; Fearnhead, 2004). In this paper we develop a novel class of PMCMC methods relying on these very efficient particle algorithms. We establish the theoretical validy of this new generic methodology referred to as discrete PMCMC and demonstrate it on a variety of examples including a multiple change-points model for well-log data and a model for U.S./U.K. exchange rate data. Discrete PMCMC algorithms are shown to outperform experimentally state-of-the-art MCMC techniques for a fixed computational complexity. Additionally they can be easily parallelized (Lee et al., 2010) which allows further substantial gains.

Nick Whiteley

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

An invitation to sequential Monte Carlo samplers

Exploiting locality in high-dimensional factorial hidden Markov models

Dimension-free Wasserstein contraction of nonlinear filters

Inference in Stochastic Epidemic Models via Multinomial Approximations

Global consensus Monte Carlo

Negative association, ordering and convergence of resampling methods

An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter

Calculating principal eigen-functions of non-negative integral kernels: particle approximations and applications

Fluctuations, stability and instability of a distributed particle filter with local exchange

On the role of interaction in sequential Monte Carlo algorithms

Perfect sampling for nonhomogeneous Markov chains and hidden Markov models

Variance estimation in the particle filter

Stability with respect to initial conditions in V-norm for nonlinear filters with ergodic observations

A hidden Markov model for decoding and the analysis of replay in spike trains

Butterfly resampling: asymptotics for particle filters with constrained interactions

Forest resampling for distributed sequential Monte Carlo

Twisted particle filters

Stability properties of some particle filters

Approximate Bayesian Computation for Smoothing

Bayesian learning of noisy Markov decision processes

Linear Variance Bounds for Particle Approximations of Time-Homogeneous Feynman-Kac Formulae

Error Bounds and Normalizing Constants for Sequential Monte Carlo in High Dimensions

Sequential Monte Carlo samplers: error bounds and insensitivity to initial conditions

Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo Methods