Source author record

Arnaud Doucet

Arnaud Doucet appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

64works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Metropolis-Adjusted Diffusion Models

Sampling from score-based diffusion models incurs bias due to both time discretisation and the approximation of the score function. A common strategy for reducing this bias is to apply corrector steps based on the unadjusted Langevin algorithm (ULA) at each noise level within a predictor-corrector framework. However, ULA is itself a biased sampler, as it discretises a continuous diffusion process. In this work, we consider adjusted Langevin correctors that employ Metropolis--Hastings (MH) or Barker's accept-reject steps to correct for this bias. Since the target density ratio typically required by MH-based algorithms is unavailable, we propose methods that instead utilise the score function to compute the correct acceptance probability. We introduce the first exact method for adjusting Langevin corrections in diffusion models, based on a two-coin Bernoulli factory algorithm. We also propose an efficient approximation based on Simpson's rule that achieves accuracy of order $5/2$ in the step size at near-zero marginal cost. We demonstrate that these procedures improve sample quality on both synthetic and image datasets, yielding consistent gains in Fréchet Inception Distance (FID) on the latter.

preprint2026arXiv

On the Wasserstein Gradient Flow Interpretation of Drifting Models

Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.

preprint2024arXiv

Ranking In Generalized Linear Bandits

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.

preprint2023arXiv

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.

preprint2022arXiv

An Empirical Study of Implicit Regularization in Deep Offline RL

Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called \textit{effective rank}, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind lab. We observe that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps. Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank. Further, we show that several other factors could confound the relationship between effective rank and performance and conclude that studying this association under simplistic assumptions could be highly misleading.

preprint2022arXiv

Chained Generalisation Bounds

This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obtained by lifting the regularity assumption from the loss onto its gradient. This allows us to re-derive the chaining mutual information bound from the literature, and to obtain novel chained information-theoretic generalisation bounds, based on the Wasserstein distance and other probability metrics. We show on some toy examples that the chained generalisation bound can be significantly tighter than its standard counterpart, particularly when the distribution of the hypotheses selected by the algorithm is very concentrated. Keywords: Generalisation bounds; Chaining; Information-theoretic bounds; Mutual information; Wasserstein distance; PAC-Bayes.

preprint2022arXiv

Conditional Simulation Using Diffusion Schrödinger Bridges

Denoising diffusion models have recently emerged as a powerful class of generative models. They provide state-of-the-art results, not only for unconditional simulation, but also when used to solve conditional simulation problems arising in a wide range of inverse problems. A limitation of these models is that they are computationally intensive at generation time as they require simulating a diffusion process over a long time horizon. When performing unconditional simulation, a Schrödinger bridge formulation of generative modeling leads to a theoretically grounded algorithm shortening generation time which is complementary to other proposed acceleration techniques. We extend the Schrödinger bridge framework to conditional simulation. We demonstrate this novel methodology on various applications including image super-resolution, optimal filtering for state-space models and the refinement of pre-trained networks. Our code can be found at https://github.com/vdeborto/cdsb.

preprint2022arXiv

Conditionally Gaussian PAC-Bayes

Recent studies have empirically investigated different methods to train stochastic neural networks on a classification task by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound. The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss. Empirical results show that this approach outperforms currently available PAC-Bayesian training methods.

preprint2022arXiv

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network width is large. Indeed, under regularity conditions, the NTK converges to a time-independent kernel in the infinite-width limit. This regime is often called the NTK regime. In parallel, recent works on signal propagation (Poole et al., 2016; Schoenholz et al., 2017; Hayou et al., 2019a) studied the impact of the initialization and the activation function on signal propagation in deep neural networks. In this paper, we connect these two theories by quantifying the impact of the initialization and the activation function on the NTK when the network depth becomes large. In particular, we provide a comprehensive analysis of the convergence rates of the NTK regime to the infinite depth regime.

preprint2022arXiv

Generative Models as Distributions of Functions

Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as functions, we can abstract away from the specific type of data we train on and construct models that are agnostic to discretization. To train our model, we use an adversarial approach with a discriminator that acts on continuous signals. Through experiments on a wide variety of data modalities including images, 3D shapes and climate data, we demonstrate that our model can learn rich distributions of functions independently of data type and resolution.

preprint2022arXiv

Importance Weighting Approach in Kernel Bayes' Rule

We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected kernel posterior features, based on regression from learned neural net or kernel features of the observations. All quantities involved in the Bayesian update are learned from observed data, making the method entirely model-free. The resulting algorithm is a novel instance of a kernel Bayes' rule (KBR), based on importance weighting. This results in superior numerical stability to the original approach to KBR, which requires operator inversion. We show the convergence of the estimator using a novel consistency analysis on the importance weighting estimator in the infinity norm. We evaluate KBR on challenging synthetic benchmarks, including a filtering problem with a state-space model involving high dimensional image observations. Importance weighted KBR yields uniformly better empirical performance than the original KBR, and competitive performance with other competing methods.

preprint2022arXiv

Learning Optimal Conformal Classifiers

Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in high-stake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's predictions, e.g., its probability estimates, to predict confidence sets containing the true class with a user-specified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper end-to-end. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on mini-batches during training. Compared to standard training, ConfTr reduces the average confidence set size (inefficiency) of state-of-the-art CP methods applied after training. Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.

preprint2022arXiv

Mitigating Statistical Bias within Differentially Private Synthetic Data

Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for generating private synthetic data from undisclosed real data. However, mechanisms of privacy preservation can significantly reduce the utility of synthetic data, which in turn impacts downstream tasks such as learning predictive models or inference. We propose several re-weighting strategies using privatised likelihood ratios that not only mitigate statistical bias of downstream estimators but also have general applicability to differentially private generative models. Through large-scale empirical evaluation, we show that private importance weighting provides simple and effective privacy-compliant augmentation for general applications of synthetic data.

preprint2022arXiv

On PAC-Bayesian reconstruction guarantees for VAEs

Despite its wide use and empirical successes, the theoretical understanding and study of the behaviour and performance of the variational autoencoder (VAE) have only emerged in the past few years. We contribute to this recent line of work by analysing the VAE's reconstruction ability for unseen test data, leveraging arguments from the PAC-Bayes theory. We provide generalisation bounds on the theoretical reconstruction error, and provide insights on the regularisation effect of VAE objectives. We illustrate our theoretical results with supporting experiments on classical benchmark datasets.

preprint2022arXiv

Online Variational Filtering and Parameter Learning

We present a variational method for online state estimation and parameter learning in state-space models (SSMs), a ubiquitous class of latent variable models for sequential data. As per standard batch variational techniques, we use stochastic gradients to simultaneously optimize a lower bound on the log evidence with respect to both model parameters and a variational approximation of the states' posterior distribution. However, unlike existing approaches, our method is able to operate in an entirely online manner, such that historic observations do not require revisitation after being incorporated and the cost of updates at each time step remains constant, despite the growing dimensionality of the joint posterior distribution of the states. This is achieved by utilizing backward decompositions of this joint posterior distribution and of its variational approximation, combined with Bellman-type recursions for the evidence lower bound and its gradients. We demonstrate the performance of this methodology across several examples, including high-dimensional SSMs and sequential Variational Auto-Encoders.

preprint2022arXiv

Riemannian Diffusion Schrödinger Bridge

Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce \emph{Riemannian Diffusion Schrödinger Bridge}. Our proposed method generalizes Diffusion Schrödinger Bridge introduced in \cite{debortoli2021neurips} to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.

preprint2021arXiv

Asymptotic Properties of Recursive Maximum Likelihood Estimation in Non-Linear State-Space Models

Using stochastic gradient search and the optimal filter derivative, it is possible to perform recursive (i.e., online) maximum likelihood estimation in a non-linear state-space model. As the optimal filter and its derivative are analytically intractable for such a model, they need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Biometrika 2018], a recursive maximum likelihood algorithm based on a particle approximation to the optimal filter derivative has been proposed and studied through numerical simulations. Here, this algorithm and its asymptotic behavior are analyzed theoretically. We show that the algorithm accurately estimates maxima to the underlying (average) log-likelihood when the number of particles is sufficiently large. We also derive (relatively) tight bounds on the estimation error. The obtained results hold under (relatively) mild conditions and cover several classes of non-linear state-space models met in practice.

preprint2021arXiv

Bias of Particle Approximations to Optimal Filter Derivative

In many applications, a state-space model depends on a parameter which needs to be inferred from a data set. Quite often, it is necessary to perform the parameter inference online. In the maximum likelihood approach, this can be done using stochastic gradient search and the optimal filter derivative. However, the optimal filter and its derivative are not analytically tractable for a non-linear state-space model and need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Biometrika 2011], a particle approximation to the optimal filter derivative has been proposed, while the corresponding $L_{p}$ error bonds and the central limit theorem have been provided in [Del Moral, Doucet and Singh, SIAM Journal on Control and Optimization 2015]. Here, the bias of this particle approximation is analyzed. We derive (relatively) tight bonds on the bias in terms of the number of particles. Under (strong) mixing conditions, the bounds are uniform in time and inversely proportional to the number of particles. The obtained results apply to a (relatively) broad class of state-space models met in practice.

preprint2020arXiv

Ensemble Rejection Sampling

We introduce Ensemble Rejection Sampling, a scheme for exact simulation from the posterior distribution of the latent states of a class of non-linear non-Gaussian state-space models. Ensemble Rejection Sampling relies on a proposal for the high-dimensional state sequence built using ensembles of state samples. Although this algorithm can be interpreted as a rejection sampling scheme acting on an extended space, we show under regularity conditions that the expected computational cost to obtain an exact sample increases cubically with the length of the state sequence instead of exponentially for standard rejection sampling. We demonstrate this methodology by sampling exactly state sequences according to the posterior distribution of a stochastic volatility model and a non-linear autoregressive process. We also present an application to rare event simulation.

preprint2020arXiv

Gibbs flow for approximate transport with applications to Bayesian computation

Let $π_{0}$ and $π_{1}$ be two distributions on the Borel space $(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d}))$. Any measurable function $T:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ such that $Y=T(X)\simπ_{1}$ if $X\simπ_{0}$ is called a transport map from $π_{0}$ to $π_{1}$. For any $π_{0}$ and $π_{1}$, if one could obtain an analytical expression for a transport map from $π_{0}$ to $π_{1}$, then this could be straightforwardly applied to sample from any distribution. One would map draws from an easy-to-sample distribution $π_{0}$ to the target distribution $π_{1}$ using this transport map. Although it is usually impossible to obtain an explicit transport map for complex target distributions, we show here how to build a tractable approximation of a novel transport map. This is achieved by moving samples from $π_{0}$ using an ordinary differential equation with a velocity field that depends on the full conditional distributions of the target. Even when this ordinary differential equation is time-discretized and the full conditional distributions are numerically approximated, the resulting distribution of mapped samples can be efficiently evaluated and used as a proposal within sequential Monte Carlo samplers. We demonstrate significant gains over state-of-the-art sequential Monte Carlo samplers at a fixed computational complexity on a variety of applications.

preprint2020arXiv

Metropolis-Hastings with Averaged Acceptance Ratios

Markov chain Monte Carlo (MCMC) methods to sample from a probability distribution $π$ defined on a space $(Θ,\mathcal{T})$ consist of the simulation of realisations of Markov chains $\{θ_{n},n\geq1\}$ of invariant distribution $π$ and such that the distribution of $θ_{i}$ converges to $π$ as $i\rightarrow\infty$. In practice one is typically interested in the computation of expectations of functions, say $f$, with respect to $π$ and it is also required that averages $M^{-1}\sum_{n=1}^{M}f(θ_{n})$ converge to the expectation of interest. The iterative nature of MCMC makes it difficult to develop generic methods to take advantage of parallel computing environments when interested in reducing time to convergence. While numerous approaches have been proposed to reduce the variance of ergodic averages, including averaging over independent realisations of $\{θ_{n},n\geq1\}$ simulated on several computers, techniques to reduce the "burn-in" of MCMC are scarce. In this paper we explore a simple and generic approach to improve convergence to equilibrium of existing algorithms which rely on the Metropolis-Hastings (MH) update, the main building block of MCMC. The main idea is to use averages of the acceptance ratio w.r.t. multiple realisations of random variables involved, while preserving $π$ as invariant distribution. The methodology requires limited change to existing code, is naturally suited to parallel computing and is shown on our examples to provide substantial performance improvements both in terms of convergence to equilibrium and variance of ergodic averages. In some scenarios gains are observed even on a serial machine.

preprint2020arXiv

Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design

When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting (tests can be mistaken) to decide adaptively (looking at past results) which groups to test next, with the goal to converge to a good detection, as quickly, and with as few tests as possible. We cast this problem as a Bayesian sequential experimental design problem. Using the posterior distribution of infection status vectors for $n$ patients, given observed tests carried out so far, we seek to form groups that have a maximal utility. We consider utilities such as mutual information, but also quantities that have a more direct relevance to testing, such as the AUC of the ROC curve of the test. Practically, the posterior distributions on $\{0,1\}^n$ are approximated by sequential Monte Carlo (SMC) samplers and the utility maximized by a greedy optimizer. Our procedures show in simulations significant improvements over both adaptive and non-adaptive baselines, and are far more efficient than individual tests when disease prevalence is low. Additionally, we show empirically that loopy belief propagation (LBP), widely regarded as the SoTA decoder to decide whether an individual is infected or not given previous tests, can be unreliable and exhibit oscillatory behavior. Our SMC decoder is more reliable, and can improve the performance of other group testing algorithms.

preprint2020arXiv

Unbiased Markov chain Monte Carlo for intractable target distributions

Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm for a class of undirected graphical models. As with any MCMC algorithm, the resulting estimators are justified asymptotically in the limit of the number of iterations, but exhibit a bias for any fixed number of iterations due to the Markov chains starting outside of stationarity. This "burn-in" bias is known to complicate the use of parallel processors for MCMC computations. We show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms. We establish the theoretical validity of some of these procedures by extending existing results to cover the case of polynomially ergodic Markov chains. The efficiency of the proposed estimators is compared with that of standard MCMC estimators, with theoretical arguments and numerical experiments including state space models and Ising models.

preprint2019arXiv

Schrödinger Bridge Samplers

Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ and terminal distribution equal to $π_{T}$ which is the closest to the reference process in terms of Kullback--Leibler divergence. This special case of the so-called Schrödinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schrödinger bridge samplers, to approximate a target distribution $π$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $π_T = π$, via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures.

preprint2018arXiv

Limit theorems for sequential MCMC methods

Sequential Monte Carlo (SMC) methods, also known as particle filters, constitute a class of algorithms used to approximate expectations with respect to a sequence of probability distributions as well as the normalising constants of those distributions. Sequential MCMC methods are an alternative class of techniques addressing similar problems in which particles are sampled according to an MCMC kernel rather than conditionally independently at each time step. These methods were introduced over twenty years ago by Berzuini et al. (1997). Recently, there has been a renewed interest in such algorithms as they demonstrate an empirical performance superior to that of SMC methods in some applications. We establish a strong law of large numbers and a central limit theorem for sequential MCMC methods and provide conditions under which errors can be controlled uniformly in time. In the context of state-space models, we provide conditions under which sequential MCMC methods can indeed outperform standard SMC methods in terms of asymptotic variance of the corresponding Monte Carlo estimators.

preprint2016arXiv

Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm

Unsupervised image segmentation aims at clustering the set of pixels of an image into spatially homogeneous regions. We introduce here a class of Bayesian nonparametric models to address this problem. These models are based on a combination of a Potts-like spatial smoothness component and a prior on partitions which is used to control both the number and size of clusters. This class of models is flexible enough to include the standard Potts model and the more recent Potts-Dirichlet Process model \cite{Orbanz2008}. More importantly, any prior on partitions can be introduced to control the global clustering structure so that it is possible to penalize small or large clusters if necessary. Bayesian computation is carried out using an original generalized Swendsen-Wang algorithm. Experiments demonstrate that our method is competitive in terms of RAND\ index compared to popular image segmentation methods, such as mean-shift, and recent alternative Bayesian nonparametric models.

preprint2016arXiv

On embedded hidden Markov models and particle Markov chain Monte Carlo methods

The embedded hidden Markov model (EHMM) sampling method is a Markov chain Monte Carlo (MCMC) technique for state inference in non-linear non-Gaussian state-space models which was proposed in Neal (2003); Neal et al. (2004) and extended in Shestopaloff and Neal (2016). An extension to Bayesian parameter inference was presented in Shestopaloff and Neal (2013). An alternative class of MCMC schemes addressing similar inference problems is provided by particle MCMC (PMCMC) methods (Andrieu et al. 2009; 2010). All these methods rely on the introduction of artificial extended target distributions for multiple state sequences which, by construction, are such that one randomly indexed sequence is distributed according to the posterior of interest. By adapting the Metropolis-Hastings algorithms developed in the framework of PMCMC methods to the EHMM framework, we obtain novel particle filter (PF)-type algorithms for state inference and novel MCMC schemes for parameter and state inference. In addition, we show that most of these algorithms can be viewed as particular cases of a general PF and PMCMC framework. We compare the empirical performance of the various algorithms on low- to high-dimensional state-space models. We demonstrate that a properly tuned conditional PF with "local" MCMC moves proposed in Shestopaloff and Neal (2016) can outperform the standard conditional PF significantly when applied to high-dimensional state-space models while the novel PF-type algorithm could prove to be an interesting alternative to standard PFs for likelihood estimation in some lower-dimensional scenarios.

preprint2015arXiv

Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models

Ionides, King et al. (see e.g. Inference for nonlinear dynamical systems, PNAS 103) have recently introduced an original approach to perform maximum likelihood parameter estimation in state-space models which only requires being able to simulate the latent Markov model according to its prior distribution. Their methodology relies on an approximation of the score vector for general statistical models based upon an artificial posterior distribution and bypasses the calculation of any derivative. We show here that this score estimator can be derived from a simple application of Stein's lemma and how an additional application of this lemma provides an original derivative-free estimator of the observed information matrix. We establish that these estimators exhibit robustness properties compared to finite difference estimators while their bias and variance scale as well as finite difference type estimators, including simultaneous perturbations (see e.g. Spall, IEEE Trans. on Automatic Control 37), with respect to the dimension of the parameter. For state-space models where sequential Monte Carlo computation is required, these estimators can be further improved. In this specific context, we derive original derivative-free estimators of the score vector and observed information matrix which are computed using sequential Monte Carlo approximations of smoothed additive functionals associated with a modified version of the original state-space model.

preprint2015arXiv

Expectation Particle Belief Propagation

We propose an original particle-based implementation of the Loopy Belief Propagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a continuous state space. The algorithm constructs adaptively efficient proposal distributions approximating the local beliefs at each note of the MRF. This is achieved by considering proposal distributions in the exponential family whose parameters are updated iterately in an Expectation Propagation (EP) framework. The proposed particle scheme provides consistent estimation of the LBP marginals as the number of particles increases. We demonstrate that it provides more accurate results than the Particle Belief Propagation (PBP) algorithm of Ihler and McAllester (2009) at a fraction of the computational cost and is additionally more robust empirically. The computational complexity of our algorithm at each iteration is quadratic in the number of particles. We also propose an accelerated implementation with sub-quadratic computational complexity which still provides consistent estimates of the loopy BP marginal distributions and performs almost as well as the original procedure.

preprint2015arXiv

On Markov chain Monte Carlo methods for tall data

Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number $n$ of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than $O(n)$ data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.

preprint2015arXiv

On Particle Methods for Parameter Estimation in State-Space Models

Nonlinear non-Gaussian state-space models are ubiquitous in statistics, econometrics, information engineering and signal processing. Particle methods, also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical approximations to the associated state inference problems. However, in most applications, the state-space model of interest also depends on unknown static parameters that need to be estimated from the data. In this context, standard particle methods fail and it is necessary to rely on more sophisticated algorithms. The aim of this paper is to present a comprehensive review of particle methods that have been proposed to perform static parameter estimation in state-space models. We discuss the advantages and limitations of these methods and illustrate their performance on simple models.

preprint2014arXiv

Asynchronous Anytime Sequential Monte Carlo

We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional particle filtering algorithms. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade is an unbiased marginal likelihood estimator which means that it can be straightforwardly plugged into existing pseudomarginal methods.

preprint2014arXiv

Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator

When an unbiased estimator of the likelihood is used within a Metropolis--Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of averages computed under this chain. Many Monte Carlo samples will typically result in Metropolis--Hastings averages with lower asymptotic variances than the corresponding Metropolis--Hastings averages using fewer samples. However, the computing time required to construct the likelihood estimator increases with the number of Monte Carlo samples. Under the assumption that the distribution of the additive noise introduced by the log-likelihood estimator is Gaussian with variance inversely proportional to the number of Monte Carlo samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We demonstrate our results by considering a stochastic volatility model applied to stock index returns.

preprint2014arXiv

Fast Computation of Wasserstein Barycenters

We present new algorithms to compute the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is the measure that minimizes the sum of its Wasserstein distances to each element in that set. We propose two original algorithms to compute Wasserstein barycenters that build upon the subgradient method. A direct implementation of these algorithms is, however, too costly because it would require the repeated resolution of large primal and dual optimal transport problems to compute subgradients. Extending the work of Cuturi (2013), we propose to smooth the Wasserstein distance used in the definition of Wasserstein barycenters with an entropic regularizer and recover in doing so a strictly convex objective whose gradients can be computed for a considerably cheaper computational cost using matrix scaling algorithms. We use these algorithms to visualize a large family of images and to solve a constrained clustering problem.

preprint2014arXiv

Perfect simulation using atomic regeneration with application to Sequential Monte Carlo

Consider an irreducible, Harris recurrent Markov chain of transition kernel Π and invariant probability measure π. If Π satisfies a minorization condition, then the split chain allows the identification of regeneration times which may be exploited to obtain perfect samples from π. Unfortunately, many transition kernels associated with complex Markov chain Monte Carlo algorithms are analytically intractable, so establishing a minorization condition and simulating the split chain is challenging, if not impossible. For uniformly ergodic Markov chains with intractable transition kernels, we propose two efficient perfect simulation procedures of similar expected running time which are instances of the multigamma coupler and an imputation scheme. These algorithms overcome the intractability of the kernel by introducing an artificial atom and using a Bernoulli factory. We detail an application of these procedures when Π is the recently introduced iterated conditional Sequential Monte Carlo kernel. We additionally provide results on the general applicability of the methodology, and how Sequential Monte Carlo methods may be used to facilitate perfect simulation and/or unbiased estimation of expectations with respect to the stationary distribution of a non-uniformly ergodic Markov chain.

preprint2013arXiv

A Lognormal Central Limit Theorem for Particle Approximations of Normalizing Constants

This paper deals with the numerical approximation of normalizing constants produced by particle methods, in the general framework of Feynman-Kac sequences of measures. It is well-known that the corresponding estimates satisfy a central limit theorem for a fixed time horizon $n$ as the number of particles $N$ goes to infinity. Here, we study the situation where both $n$ and $N$ go to infinity in such a way that $\lim_{n\rightarrow\infty}% n/N=α>0$. In this context, Pitt et al. \cite{pitt2012} recently conjectured that a lognormal central limit theorem should hold. We formally establish this result here, under general regularity assumptions on the model. We also discuss special classes of models (time-homogeneous environment and ergodic random environment) for which more explicit descriptions of the limiting bias and variance can be obtained.

preprint2013arXiv

Perfect simulation for the Feynman-Kac law on the path space

This paper describes an algorithm of interest. This is a preliminary version and we intend on writing a better descripition of it and getting bounds for its complexity.

preprint2013arXiv

Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Particle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as "condensation", "sequential Monte Carlo" and "survival of the fittest". In this paper, we show how we can exploit the structure of the DBN to increase the efficiency of particle filtering, using a technique known as Rao-Blackwellisation. Essentially, this samples some of the variables, and marginalizes out the rest exactly, using the Kalman filter, HMM filter, junction tree algorithm, or any other finite dimensional optimal filter. We show that Rao-Blackwellised particle filters (RBPFs) lead to more accurate estimates than standard PFs. We demonstrate RBPFs on two problems, namely non-stationary online regression with radial basis function networks and robot localization and map building. We also discuss other potential application areas and provide references to some finite dimensional optimal filters.

preprint2013arXiv

Reversible Jump MCMC Simulated Annealing for Neural Networks

We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We also show that by calibrating a Bayesian model, we can obtain the classical AIC, BIC and MDL model selection criteria within a penalized likelihood framework. Finally, we show theoretically and empirically that the algorithm converges to the modes of the full posterior distribution in an efficient way.

preprint2012arXiv

An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration

While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains -- a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm's adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.

preprint2012arXiv

Distributed Maximum Likelihood for Simultaneous Self-localization and Tracking in Sensor Networks

We show that the sensor self-localization problem can be cast as a static parameter estimation problem for Hidden Markov Models and we implement fully decentralized versions of the Recursive Maximum Likelihood and on-line Expectation-Maximization algorithms to localize the sensor network simultaneously with target tracking. For linear Gaussian models, our algorithms can be implemented exactly using a distributed version of the Kalman filter and a novel message passing algorithm. The latter allows each node to compute the local derivatives of the likelihood or the sufficient statistics needed for Expectation-Maximization. In the non-linear case, a solution based on local linearization in the spirit of the Extended Kalman Filter is proposed. In numerical examples we demonstrate that the developed algorithms are able to learn the localization parameters.

preprint2012arXiv

Fluctuations of Interacting Markov Chain Monte Carlo Methods

We present a multivariate central limit theorem for a general class of interacting Markov chain Monte Carlo algorithms used to solve nonlinear measure-valued equations. These algorithms generate stochastic processes which belong to the class of nonlinear Markov chains interacting with their empirical occupation measures. We develop an original theoretical analysis based on resolvent operators and semigroup techniques to analyze the fluctuations of their occupation measures around their limiting values.

preprint2012arXiv

Generalized Polya Urn for Time-varying Dirichlet Process Mixtures

Dirichlet Process Mixtures (DPMs) are a popular class of statistical models to perform density estimation and clustering. However, when the data available have a distribution evolving over time, such models are inadequate. We introduce here a class of time-varying DPMs which ensures that at each time step the random distribution follows a DPM model. Our model relies on an intuitive and simple generalized Polya urn scheme. Inference is performed using Markov chain Monte Carlo and Sequential Monte Carlo. We demonstrate our model on various applications.

preprint2012arXiv

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.

preprint2012arXiv

On adaptive resampling strategies for sequential Monte Carlo methods

Sequential Monte Carlo (SMC) methods are a class of techniques to sample approximately from any sequence of probability distributions using a combination of importance sampling and resampling steps. This paper is concerned with the convergence analysis of a class of SMC methods where the times at which resampling occurs are computed online using criteria such as the effective sample size. This is a popular approach amongst practitioners but there are very few convergence results available for these methods. By combining semigroup techniques with an original coupling argument, we obtain functional central limit theorems and uniform exponential concentration estimates for these algorithms.

preprint2012arXiv

Sequentially interacting Markov chain Monte Carlo methods

Sequential Monte Carlo (SMC) is a methodology for sampling approximately from a sequence of probability distributions of increasing dimension and estimating their normalizing constants. We propose here an alternative methodology named Sequentially Interacting Markov Chain Monte Carlo (SIMCMC). SIMCMC methods work by generating interacting non-Markovian sequences which behave asymptotically like independent Metropolis-Hastings (MH) Markov chains with the desired limiting distributions. Contrary to SMC, SIMCMC allows us to iteratively improve our estimates in an MCMC-like fashion. We establish convergence results under realistic verifiable assumptions and demonstrate its performance on several examples arising in Bayesian time series analysis.

preprint2012arXiv

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors.

preprint2012arXiv

Sparsity-Promoting Bayesian Dynamic Linear Models

Sparsity-promoting priors have become increasingly popular over recent years due to an increased number of regression and classification applications involving a large number of predictors. In time series applications where observations are collected over time, it is often unrealistic to assume that the underlying sparsity pattern is fixed. We propose here an original class of flexible Bayesian linear models for dynamic sparsity modelling. The proposed class of models expands upon the existing Bayesian literature on sparse regression using generalized multivariate hyperbolic distributions. The properties of the models are explored through both analytic results and simulation studies. We demonstrate the model on a financial application where it is shown that it accurately represents the patterns seen in the analysis of stock and derivative data, and is able to detect major events by filtering an artificial portfolio of assets.

preprint2012arXiv

Toward Practical N2 Monte Carlo: the Marginal Particle Filter

Sequential Monte Carlo techniques are useful for state estimation in non-linear, non-Gaussian dynamic models. These methods allow us to approximate the joint posterior distribution using sequential importance sampling. In this framework, the dimension of the target distribution grows with each time step, thus it is necessary to introduce some resampling steps to ensure that the estimates provided by the algorithm have a reasonable variance. In many applications, we are only interested in the marginal filtering distribution which is defined on a space of fixed dimension. We present a Sequential Monte Carlo algorithm called the Marginal Particle Filter which operates directly on the marginal distribution, hence avoiding having to perform importance sampling on a space of growing dimension. Using this idea, we also derive an improved version of the auxiliary particle filter. We show theoretic and empirical results which demonstrate a reduction in variance over conventional particle filtering, and present techniques for reducing the cost of the marginal particle filter with N particles from O(N2) to O(N logN).

preprint2011arXiv

Autoregressive Kernels For Time Series

We propose in this work a new family of kernels for variable-length time series. Our work builds upon the vector autoregressive (VAR) model for multivariate stochastic processes: given a multivariate time series x, we consider the likelihood function p_θ(x) of different parameters θin the VAR model as features to describe x. To compare two time series x and x', we form the product of their features p_θ(x) p_θ(x') which is integrated out w.r.t θusing a matrix normal-inverse Wishart prior. Among other properties, this kernel can be easily computed when the dimension d of the time series is much larger than the lengths of the considered time series x and x'. It can also be generalized to time series taking values in arbitrary state spaces, as long as the state space itself is endowed with a kernel κ. In that case, the kernel between x and x' is a a function of the Gram matrices produced by κon observations and subsequences of observations enumerated in x and x'. We describe a computationally efficient implementation of this generalization that uses low-rank matrix factorization techniques. These kernels are compared to other known kernels using a set of benchmark classification tasks carried out with support vector machines.

preprint2011arXiv

Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors

We explore the use of generalized t priors on regression coefficients to help understand the nature of association signal within "hit regions" of genome-wide association studies. The particular generalized t distribution we adopt is a Student distribution on the absolute value of its argument. For low degrees of freedom we show that the generalized t exhibits 'sparsity-prior' properties with some attractive features over other common forms of sparse priors and includes the well known double-exponential distribution as the degrees of freedom tends to infinity. We pay particular attention to graphical representations of posterior statistics obtained from sparsity-path-analysis (SPA) where we sweep over the setting of the scale (shrinkage / precision) parameter in the prior to explore the space of posterior models obtained over a range of complexities, from very sparse models with all coefficient distributions heavily concentrated around zero, to models with diffuse priors and coefficients distributed around their maximum likelihood estimates. The SPA plots are akin to LASSO plots of maximum a posteriori (MAP) estimates but they characterise the complete marginal posterior distributions of the coefficients plotted as a function of the precision of the prior. Generating posterior distributions over a range of prior precisions is computationally challenging but naturally amenable to sequential Monte Carlo (SMC) algorithms indexed on the scale parameter. We show how SMC simulation on graphic-processing-units (GPUs) provides very efficient inference for SPA. We also present a scale-mixture representation of the generalized t prior that leads to an EM algorithm to obtain MAP estimates should only these be required.

preprint2011arXiv

Calibration and filtering for multi factor commodity models with seasonality: incorporating panel data from futures contracts

We examine a general multi-factor model for commodity spot prices and futures valuation. We extend the multi-factor long-short model in Schwartz and Smith (2000) and Yan (2002) in two important aspects: firstly we allow for both the long and short term dynamic factors to be mean reverting incorporating stochastic volatility factors and secondly we develop an additive structural seasonality model. Then a Milstein discretized non-linear stochastic volatility state space representation for the model is developed which allows for futures and options contracts in the observation equation. We then develop numerical methodology based on an advanced Sequential Monte Carlo algorithm utilising Particle Markov chain Monte Carlo to perform calibration of the model jointly with the filtering of the latent processes for the long-short dynamics and volatility factors. In this regard we explore and develop a novel methodology based on an adaptive Rao-Blackwellised version of the Particle Markov chain Monte Carlo methodology. In doing this we deal accurately with the non-linearities in the state-space model which are therefore introduced into the filtering framework. We perform analysis on synthetic and real data for oil commodities.

preprint2011arXiv

On nonlinear Markov chain Monte Carlo

Let $\mathscr{P}(E)$ be the space of probability measures on a measurable space $(E,\mathcal{E})$. In this paper we introduce a class of nonlinear Markov chain Monte Carlo (MCMC) methods for simulating from a probability measure $π\in\mathscr{P}(E)$. Nonlinear Markov kernels (see [Feynman--Kac Formulae: Genealogical and Interacting Particle Systems with Applications (2004) Springer]) $K:\mathscr{P}(E)\times E\rightarrow\mathscr{P}(E)$ can be constructed to, in some sense, improve over MCMC methods. However, such nonlinear kernels cannot be simulated exactly, so approximations of the nonlinear kernels are constructed using auxiliary or potentially self-interacting chains. Several nonlinear kernels are presented and it is demonstrated that, under some conditions, the associated approximations exhibit a strong law of large numbers; our proof technique is via the Poisson equation and Foster--Lyapunov conditions. We investigate the performance of our approximations with some simulations.

preprint2011arXiv

Uniform Stability of a Particle Approximation of the Optimal Filter Derivative

Sequential Monte Carlo methods, also known as particle methods, are a widely used set of computational tools for inference in non-linear non-Gaussian state-space models. In many applications it may be necessary to compute the sensitivity, or derivative, of the optimal filter with respect to the static parameters of the state-space model; for instance, in order to obtain maximum likelihood model parameters of interest, or to compute the optimal controller in an optimal control problem. In Poyiadjis et al. [2011] an original particle algorithm to compute the filter derivative was proposed and it was shown using numerical examples that the particle estimate was numerically stable in the sense that it did not deteriorate over time. In this paper we substantiate this claim with a detailed theoretical study. Lp bounds and a central limit theorem for this particle approximation of the filter derivative are presented. It is further shown that under mixing conditions these Lp bounds and the asymptotic variance characterized by the central limit theorem are uniformly bounded with respect to the time index. We demon- strate the performance predicted by theory with several numerical examples. We also use the particle approximation of the filter derivative to perform online maximum likelihood parameter estimation for a stochastic volatility model.

preprint2010arXiv

A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors

Variable selection techniques have become increasingly popular amongst statisticians due to an increased number of regression and classification applications involving high-dimensional data where we expect some predictors to be unimportant. In this context, Bayesian variable selection techniques involving Markov chain Monte Carlo exploration of the posterior distribution over models can be prohibitively computationally expensive and so there has been attention paid to quasi-Bayesian approaches such as maximum a posteriori (MAP) estimation using priors that induce sparsity in such estimates. We focus on this latter approach, expanding on the hierarchies proposed to date to provide a Bayesian interpretation and generalization of state-of-the-art penalized optimization approaches and providing simultaneously a natural way to include prior information about parameters within this framework. We give examples of how to use this hierarchy to compute MAP estimates for linear and logistic regression as well as sparse precision-matrix estimates in Gaussian graphical models. In addition, an adaptive group lasso method is derived using the framework.

preprint2010arXiv

Channel Tracking for Relay Networks via Adaptive Particle MCMC

This paper presents a new approach for channel tracking and parameter estimation in cooperative wireless relay networks. We consider a system with multiple relay nodes operating under an amplify and forward relay function. We develop a novel algorithm to efficiently solve the challenging problem of joint channel tracking and parameters estimation of the Jakes' system model within a mobile wireless relay network. This is based on \textit{particle Markov chain Monte Carlo} (PMCMC) method. In particular, it first involves developing a Bayesian state space model, then estimating the associated high dimensional posterior using an adaptive Markov chain Monte Carlo (MCMC) sampler relying on a proposal built using a Rao-Blackwellised Sequential Monte Carlo (SMC) filter.

preprint2010arXiv

Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"

This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.

preprint2010arXiv

Efficient Bayesian Inference for Generalized Bradley-Terry Models

The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications.

preprint2010arXiv

Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo Methods

Switching state-space models (SSSM) are a very popular class of time series models that have found many applications in statistics, econometrics and advanced signal processing. Bayesian inference for these models typically relies on Markov chain Monte Carlo (MCMC) techniques. However, even sophisticated MCMC methods dedicated to SSSM can prove quite inefficient as they update potentially strongly correlated discrete-valued latent variables one-at-a-time (Carter and Kohn, 1996; Gerlach et al., 2000; Giordani and Kohn, 2008). Particle Markov chain Monte Carlo (PMCMC) methods are a recently developed class of MCMC algorithms which use particle filters to build efficient proposal distributions in high-dimensions (Andrieu et al., 2010). The existing PMCMC methods of Andrieu et al. (2010) are applicable to SSSM, but are restricted to employing standard particle filtering techniques. Yet, in the context of discrete-valued latent variables, specialised particle techniques have been developed which can outperform by up to an order of magnitude standard methods (Fearnhead, 1998; Fearnhead and Clifford, 2003; Fearnhead, 2004). In this paper we develop a novel class of PMCMC methods relying on these very efficient particle algorithms. We establish the theoretical validy of this new generic methodology referred to as discrete PMCMC and demonstrate it on a variety of examples including a multiple change-points model for well-log data and a model for U.S./U.K. exchange rate data. Discrete PMCMC algorithms are shown to outperform experimentally state-of-the-art MCMC techniques for a fixed computational complexity. Additionally they can be easily parallelized (Lee et al., 2010) which allows further substantial gains.

preprint2010arXiv

Forward Smoothing using Sequential Monte Carlo

Sequential Monte Carlo (SMC) methods are a widely used set of computational tools for inference in non-linear non-Gaussian state-space models. We propose a new SMC algorithm to compute the expectation of additive functionals recursively. Essentially, it is an online or forward-only implementation of a forward filtering backward smoothing SMC algorithm proposed in Doucet .et .al (2000). Compared to the standard path space SMC estimator whose asymptotic variance increases quadratically with time even under favourable mixing assumptions, the asymptotic variance of the proposed SMC estimator only increases linearly with time. This forward smoothing procedure allows us to implement on-line maximum likelihood parameter estimation algorithms which do not suffer from the particle path degeneracy problem.

preprint2010arXiv

Grouping Priors and the Bayesian Elastic Net

In the literature surrounding Bayesian penalized regression, the two primary choices of prior distribution on the regression coefficients are zero-mean Gaussian and Laplace. While both have been compared numerically and theoretically, there remains little guidance on which to use in real-life situations. We propose two viable solutions to this problem in the form of prior distributions which combine and compromise between Laplace and Gaussian priors, respectively. Through cross-validation the prior which optimizes prediction performance is automatically selected. We then demonstrate the improved performance of these new prior distributions relative to Laplace and Gaussian priors in both a simulated and experimental environment.

preprint2010arXiv

Interacting Markov chain Monte Carlo methods for solving nonlinear measure-valued equations

We present a new class of interacting Markov chain Monte Carlo algorithms for solving numerically discrete-time measure-valued equations. The associated stochastic processes belong to the class of self-interacting Markov chains. In contrast to traditional Markov chains, their time evolutions depend on the occupation measure of their past values. This general methodology allows us to provide a natural way to sample from a sequence of target probability measures of increasing complexity. We develop an original theoretical analysis to analyze the behavior of these iterative algorithms which relies on measure-valued processes and semigroup techniques. We establish a variety of convergence results including exponential estimates and a uniform convergence theorem with respect to the number of target distributions. We also illustrate these algorithms in the context of Feynman-Kac distribution flows.

preprint2010arXiv

Particle approximation of the intensity measures of a spatial branching point process arising in multi-target tracking

The aim of this paper is two-fold. First we analyze the sequence of intensity measures of a spatial branching point process arising in a multiple target tracking context. We study its stability properties, characterize its long time behavior and provide a series of weak Lipschitz type functional contraction inequalities. Second we design and analyze an original particle scheme to approximate numerically these intensity measures. Under appropriate regularity conditions, we obtain uniform and non asymptotic estimates and a functional central limit theorem. To the best of our knowledge, these are the first sharp theoretical results available for this class of spatial branching point processes.

preprint2009arXiv

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers. For certain classes of Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

Arnaud Doucet

What is connected

Connect this record

See the researcher in context

Building this map preview

64 published item(s)

Metropolis-Adjusted Diffusion Models

On the Wasserstein Gradient Flow Interpretation of Drifting Models

Ranking In Generalized Linear Bandits

A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

An Empirical Study of Implicit Regularization in Deep Offline RL

Chained Generalisation Bounds

Conditional Simulation Using Diffusion Schrödinger Bridges

Conditionally Gaussian PAC-Bayes

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

Generative Models as Distributions of Functions

Importance Weighting Approach in Kernel Bayes' Rule

Learning Optimal Conformal Classifiers

Mitigating Statistical Bias within Differentially Private Synthetic Data

On PAC-Bayesian reconstruction guarantees for VAEs

Online Variational Filtering and Parameter Learning

Riemannian Diffusion Schrödinger Bridge

Asymptotic Properties of Recursive Maximum Likelihood Estimation in Non-Linear State-Space Models

Bias of Particle Approximations to Optimal Filter Derivative

Ensemble Rejection Sampling

Gibbs flow for approximate transport with applications to Bayesian computation

Metropolis-Hastings with Averaged Acceptance Ratios

Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design

Unbiased Markov chain Monte Carlo for intractable target distributions

Schrödinger Bridge Samplers

Limit theorems for sequential MCMC methods

Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm

On embedded hidden Markov models and particle Markov chain Monte Carlo methods

Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models

Expectation Particle Belief Propagation

On Markov chain Monte Carlo methods for tall data

On Particle Methods for Parameter Estimation in State-Space Models

Asynchronous Anytime Sequential Monte Carlo

Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator

Fast Computation of Wasserstein Barycenters

Perfect simulation using atomic regeneration with application to Sequential Monte Carlo

A Lognormal Central Limit Theorem for Particle Approximations of Normalizing Constants

Perfect simulation for the Feynman-Kac law on the path space

Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks

Reversible Jump MCMC Simulated Annealing for Neural Networks

An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration

Distributed Maximum Likelihood for Simultaneous Self-localization and Tracking in Sensor Networks

Fluctuations of Interacting Markov Chain Monte Carlo Methods

Generalized Polya Urn for Time-varying Dirichlet Process Mixtures

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

On adaptive resampling strategies for sequential Monte Carlo methods

Sequentially interacting Markov chain Monte Carlo methods

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

Sparsity-Promoting Bayesian Dynamic Linear Models

Toward Practical N2 Monte Carlo: the Marginal Particle Filter

Autoregressive Kernels For Time Series

Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors

Calibration and filtering for multi factor commodity models with seasonality: incorporating panel data from futures contracts

On nonlinear Markov chain Monte Carlo

Uniform Stability of a Particle Approximation of the Optimal Filter Derivative

A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors

Channel Tracking for Relay Networks via Adaptive Particle MCMC

Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"

Efficient Bayesian Inference for Generalized Bradley-Terry Models

Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo Methods

Forward Smoothing using Sequential Monte Carlo

Grouping Priors and the Bayesian Elastic Net

Interacting Markov chain Monte Carlo methods for solving nonlinear measure-valued equations

Particle approximation of the intensity measures of a spatial branching point process arising in multi-target tracking

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods