Source author record

Nicolas Chopin

Nicolas Chopin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Methodology math.ST Statistics Theory Machine Learning Applications math.PR astro-ph.CO Digital Libraries Distributed, Parallel, and Cluster Computing Information Retrieval math.HO physics.soc-ph stat.OT

Catalog footprint

What is connected

38works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

preprint2022arXiv

De-Sequentialized Monte Carlo: a parallel-in-time particle smoother

Particle smoothers are SMC (Sequential Monte Carlo) algorithms designed to approximate the joint distribution of the states given observations from a state-space model. We propose dSMC (de-Sequentialized Monte Carlo), a new particle smoother that is able to process $T$ observations in $\mathcal{O}(\log T)$ time on parallel architecture. This compares favourably with standard particle smoothers, the complexity of which is linear in $T$. We derive $\mathcal{L}_p$ convergence results for dSMC, with an explicit upper bound, polynomial in $T$. We then discuss how to reduce the variance of the smoothing estimates computed by dSMC by (i) designing good proposal distributions for sampling the particles at the initialization of the algorithm, as well as by (ii) using lazy resampling to increase the number of particles used in dSMC. Finally, we design a particle Gibbs sampler based on dSMC, which is able to perform parameter inference in a state-space model at a $\mathcal{O}(\log(T))$ cost on parallel hardware.

preprint2022arXiv

Improved Gibbs samplers for Cosmic Microwave Background power spectrum estimation

We study different variants of the Gibbs sampler algorithm from the perspective of their applicability to the estimation of power spectra of the cosmic microwave background (CMB) anisotropies. These include approaches studied earlier in the CMB literature as well as new ones which are proposed in this work. We demonstrate all these variants on full and cut sky simulations and compare their performance, assessing both their computational and statistical efficiency. For this we employ a consistent comparison metric, an effective sample size (ESS) per second, commonly used in this context in the statistical literature. We show that one of the proposed approaches, referred to as Centered overrelax, which capitalizes on additional, auxiliary variables to minimize computational time needed per sample, and uses overrelaxation to decorrelate subsequent samples, performs better than the standard Gibbs sampler by a factor between one and two orders of magnitude in the nearly full-sky, satellite-like cases. It therefore potentially provides an interesting alternative to the currently favored approaches.

preprint2022arXiv

On resampling schemes for particle filters with weakly informative observations

We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time limit, which is expressed as a suitably defined `infinitesimal generator.' By contrasting these generators, we find that (certain modifications of) systematic and SSP resampling `dominate' stratified and independent `killing' resampling in terms of their limiting overall resampling rate. The reduced intensity of resampling manifests itself in lower variance in our numerical experiment. This efficiency result, through an ordering of the resampling rate, is new to the literature. The second major contribution of this work concerns the analysis of the limiting behaviour of the entire population of particles of the particle filter as the time discretisation becomes finer. We provide the first proof, under general conditions, that the particle approximation of the discretised continuous-time Feynman--Kac path integral models converges to a (uniformly weighted) continuous-time particle system.

preprint2020arXiv

Adaptive Tuning Of Hamiltonian Monte Carlo Within Sequential Monte Carlo

Sequential Monte Carlo (SMC) samplers form an attractive alternative to MCMC for Bayesian computation. However, their performance depends strongly on the Markov kernels used to rejuvenate particles. We discuss how to calibrate automatically (using the current particles) Hamiltonian Monte Carlo kernels within SMC. To do so, we build upon the adaptive SMC approach of Fearnhead and Taylor (2013), and we also suggest alternative methods. We illustrate the advantages of using HMC kernels within an SMC sampler via an extensive numerical study.

preprint2020arXiv

Metropolis-Hastings with Averaged Acceptance Ratios

Markov chain Monte Carlo (MCMC) methods to sample from a probability distribution $π$ defined on a space $(Θ,\mathcal{T})$ consist of the simulation of realisations of Markov chains $\{θ_{n},n\geq1\}$ of invariant distribution $π$ and such that the distribution of $θ_{i}$ converges to $π$ as $i\rightarrow\infty$. In practice one is typically interested in the computation of expectations of functions, say $f$, with respect to $π$ and it is also required that averages $M^{-1}\sum_{n=1}^{M}f(θ_{n})$ converge to the expectation of interest. The iterative nature of MCMC makes it difficult to develop generic methods to take advantage of parallel computing environments when interested in reducing time to convergence. While numerous approaches have been proposed to reduce the variance of ergodic averages, including averaging over independent realisations of $\{θ_{n},n\geq1\}$ simulated on several computers, techniques to reduce the "burn-in" of MCMC are scarce. In this paper we explore a simple and generic approach to improve convergence to equilibrium of existing algorithms which rely on the Metropolis-Hastings (MH) update, the main building block of MCMC. The main idea is to use averages of the acceptance ratio w.r.t. multiple realisations of random variables involved, while preserving $π$ as invariant distribution. The methodology requires limited change to existing code, is naturally suited to parallel computing and is shown on our examples to provide substantial performance improvements both in terms of convergence to equilibrium and variance of ergodic averages. In some scenarios gains are observed even on a serial machine.

preprint2020arXiv

Negative association, ordering and convergence of resampling methods

We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost-sure weak convergence of measures output from Kitagawa's (1996) stratified resampling method. Carpenter et al's (1999) systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of Srinivasan (2001), which shares some attractive properties of systematic resampling, but which exhibits negative association and therefore converges irrespective of the order of the input samples. We confirm a conjecture made by Kitagawa (1996) that ordering input samples by their states in $\mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $\mathbb{R}^d$, the variance of the resampling error is ${\scriptscriptstyle\mathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.

preprint2016arXiv

Control functionals for Monte Carlo integration

A non-parametric extension of control variates is presented. These leverage gradient information on the sampling density to achieve substantial variance reduction. It is not required that the sampling density be normalised. The novel contribution of this work is based on two important insights; (i) a trade-off between random sampling and deterministic approximation and (ii) a new gradient-based function space derived from Stein's identity. Unlike classical control variates, our estimators achieve super-root-$n$ convergence, often requiring orders of magnitude fewer simulations to achieve a fixed level of precision. Theoretical and empirical results are presented, the latter focusing on integration problems arising in hierarchical models and models based on non-linear ordinary differential equations.

preprint2015arXiv

Application of Sequential Quasi-Monte Carlo to Autonomous Positioning

Sequential Monte Carlo algorithms (also known as particle filters) are popular methods to approximate filtering (and related) distributions of state-space models. However, they converge at the slow $1/\sqrt{N}$ rate, which may be an issue in real-time data-intensive scenarios. We give a brief outline of SQMC (Sequential Quasi-Monte Carlo), a variant of SMC based on low-discrepancy point sets proposed by Gerber and Chopin (2015), which converges at a faster rate, and we illustrate the greater performance of SQMC on autonomous positioning problems.

preprint2015arXiv

Convergence of Sequential Quasi-Monte Carlo Smoothing Algorithms

Gerber and Chopin (2015) recently introduced Sequential quasi-Monte Carlo (SQMC) algorithms as an efficient way to perform filtering in state-space models. The basic idea is to replace random variables with low-discrepancy point sets, so as to obtain faster convergence than with standard particle filtering. Gerber and Chopin (2015) describe briefly several ways to extend SQMC to smoothing, but do not provide supporting theory for this extension. We discuss more thoroughly how smoothing may be performed within SQMC, and derive convergence results for the so-obtained smoothing algorithms. We consider in particular SQMC equivalents of forward smoothing and forward filtering backward sampling, which are the most well-known smoothing techniques. As a preliminary step, we provide a generalization of the classical result of Hlawka and Mück (1972) on the transformation of QMC point sets into low discrepancy point sets with respect to non uniform distributions. As a corollary of the latter, we note that we can slightly weaken the assumptions to prove the consistency of SQMC.

preprint2015arXiv

Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference

ABC algorithms are notoriously expensive in computing time, as they require simulating many complete artificial datasets from the model. We advocate in this paper a "divide and conquer" approach to ABC, where we split the likelihood into n factors, and combine in some way n "local" ABC approximations of each factor. This has two advantages: (a) such an approach is typically much faster than standard ABC and (b) it makes it possible to use local summary statistics (i.e. summary statistics that depend only on the data-points that correspond to a single factor), rather than global summary statistics (that depend on the complete dataset). This greatly alleviates the bias introduced by summary statistics, and even removes it entirely in situations where local summary statistics are simply the identity function. We focus on EP (Expectation-Propagation), a convenient and powerful way to combine n local approximations into a global approximation. Compared to the EP- ABC approach of Barthelmé and Chopin (2014), we present two variations, one based on the parallel EP algorithm of Cseke and Heskes (2011), which has the advantage of being implementable on a parallel architecture, and one version which bridges the gap between standard EP and parallel EP. We illustrate our approach with an expensive application of ABC, namely inference on spatial extremes.

preprint2015arXiv

Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation

Abstract. Whenever a new approach to perform Bayesian computation is introduced, a common practice is to showcase this approach on a binary regression model and datasets of moderate size. This paper discusses to which extent this practice is sound. It also reviews the current state of the art of Bayesian computation, using binary regression as a running example. Both sampling-based algorithms (importance sampling, MCMC and SMC) and fast approximations (Laplace and EP) are covered. Extensive numerical results are provided, some of which might go against conventional wisdom regarding the effectiveness of certain algorithms. Implications for other problems (variable selection) and other models are also discussed.

preprint2015arXiv

On particle Gibbs sampling

The particle Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm to sample from the full posterior distribution of a state-space model. It does so by executing Gibbs sampling steps on an extended target distribution defined on the space of the auxiliary variables generated by an interacting particle system. This paper makes the following contributions to the theoretical study of this algorithm. Firstly, we present a coupling construction between two particle Gibbs updates from different starting points and we show that the coupling probability may be made arbitrarily close to one by increasing the number of particles. We obtain as a direct corollary that the particle Gibbs kernel is uniformly ergodic. Secondly, we show how the inclusion of an additional Gibbs sampling step that reselects the ancestors of the particle Gibbs' extended target distribution, which is a popular approach in practice to improve mixing, does indeed yield a theoretically more efficient algorithm as measured by the asymptotic variance. Thirdly, we extend particle Gibbs to work with lower variance resampling schemes. A detailed numerical study is provided to demonstrate the efficiency of particle Gibbs and the proposed variants.

preprint2015arXiv

On Particle Methods for Parameter Estimation in State-Space Models

Nonlinear non-Gaussian state-space models are ubiquitous in statistics, econometrics, information engineering and signal processing. Particle methods, also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical approximations to the associated state inference problems. However, in most applications, the state-space model of interest also depends on unknown static parameters that need to be estimated from the data. In this context, standard particle methods fail and it is necessary to rely on more sophisticated algorithms. The aim of this paper is to present a comprehensive review of particle methods that have been proposed to perform static parameter estimation in state-space models. We discuss the advantages and limitations of these methods and illustrate their performance on simple models.

preprint2015arXiv

On the properties of variational approximations of Gibbs posteriors

The PAC-Bayesian approach is a powerful set of techniques to derive non- asymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately intractable. One may sample from it using Markov chain Monte Carlo, but this is often too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. We specialise our results to several learning tasks (classification, ranking, matrix completion),discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets.

preprint2015arXiv

Towards automatic calibration of the number of state particles within the SMC$^2$ algorithm

SMC$^2$ is an efficient algorithm for sequential estimation and state inference of state-space models. It generates $N_θ$ parameter particles $θ^{m}$, and, for each $θ^{m}$, it runs a particle filter of size $N_{x}$ (i.e. at each time step, $N_{x}$ particles are generated in the state space $\mathcal{X}$). We discuss how to automatically calibrate $N_{x}$ in the course of the algorithm. Our approach relies on conditional Sequential Monte Carlo updates, monitoring the state of the pseudo random number generator and on an estimator of the variance of the unbiased estimate of the likelihood that is produced by the particle filters, which is obtained using nonparametric regression techniques. We observe that our approach is both less CPU intensive and with smaller Monte Carlo errors than the initial version of SMC$^2$.

preprint2014arXiv

Bayesian matrix completion: prior specification

Low-rank matrix estimation from incomplete measurements recently received increased attention due to the emergence of several challenging applications, such as recommender systems; see in particular the famous Netflix challenge. While the behaviour of algorithms based on nuclear norm minimization is now well understood, an as yet unexplored avenue of research is the behaviour of Bayesian algorithms in this context. In this paper, we briefly review the priors used in the Bayesian literature for matrix completion. A standard approach is to assign an inverse gamma prior to the singular values of a certain singular value decomposition of the matrix of interest; this prior is conjugate. However, we show that two other types of priors (again for the singular values) may be conjugate for this model: a gamma prior, and a discrete prior. Conjugacy is very convenient, as it makes it possible to implement either Gibbs sampling or Variational Bayes. Interestingly enough, the maximum a posteriori for these different priors is related to the nuclear norm minimization problems. We also compare all these priors on simulated datasets, and on the classical MovieLens and Netflix datasets.

preprint2014arXiv

PAC-Bayesian AUC classification and scoring

We develop a scoring and classification procedure based on the PAC-Bayesian approach and the AUC (Area Under Curve) criterion. We focus initially on the class of linear score functions. We derive PAC-Bayesian non-asymptotic bounds for two types of prior for the score parameters: a Gaussian prior, and a spike-and-slab prior; the latter makes it possible to perform feature selection. One important advantage of our approach is that it is amenable to powerful Bayesian computational tools. We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method. We also extend our method to a class of non-linear score functions, essentially leading to a nonparametric procedure, by considering a Gaussian process prior.

preprint2014arXiv

Sequential Quasi-Monte Carlo

We derive and study SQMC (Sequential Quasi-Monte Carlo), a class of algorithms obtained by introducing QMC point sets in particle filtering. SQMC is related to, and may be seen as an extension of, the array-RQMC algorithm of L'Ecuyer et al. (2006). The complexity of SQMC is $O(N \log N)$, where $N$ is the number of simulations at each iteration, and its error rate is smaller than the Monte Carlo rate $O_P(N^{-1/2})$. The only requirement to implement SQMC is the ability to write the simulation of particle $x_t^n$ given $x_{t-1}^n$ as a deterministic function of $x_{t-1}^n$ and a fixed number of uniform variates. We show that SQMC is amenable to the same extensions as standard SMC, such as forward smoothing, backward smoothing, unbiased likelihood evaluation, and so on. In particular, SQMC may replace SMC within a PMCMC (particle Markov chain Monte Carlo) algorithm. We establish several convergence results. We provide numerical evidence that SQMC may significantly outperform SMC in practical scenarios.

preprint2014arXiv

The Poisson transform for unnormalised statistical models

Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space $Ω$ can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on $Ω$. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyvärinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.

preprint2013arXiv

Perfect simulation for the Feynman-Kac law on the path space

This paper describes an algorithm of interest. This is a preliminary version and we intend on writing a better descripition of it and getting bounds for its complexity.

preprint2012arXiv

Bayesian learning of noisy Markov decision processes

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

preprint2012arXiv

Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process

A stationary Gaussian process is said to be long-range dependent (resp., anti-persistent) if its spectral density $f(λ)$ can be written as $f(λ)=|λ|^{-2d}g(|λ|)$, where $0<d<1/2$ (resp., $-1/2<d<0$), and $g$ is continuous and positive. We propose a novel Bayesian nonparametric approach for the estimation of the spectral density of such processes. We prove posterior consistency for both $d$ and $g$, under appropriate conditions on the prior distribution. We establish the rate of convergence for a general class of priors and apply our results to the family of fractionally exponential priors. Our approach is based on the true likelihood and does not resort to Whittle's approximation.

preprint2012arXiv

Computational aspects of Bayesian spectral density estimation

Gaussian time-series models are often specified through their spectral density. Such models present several computational challenges, in particular because of the non-sparse nature of the covariance matrix. We derive a fast approximation of the likelihood for such models. We propose to sample from the approximate posterior (that is, the prior times the approximate likelihood), and then to recover the exact posterior through importance sampling. We show that the variance of the importance sampling weights vanishes as the sample size goes to infinity. We explain why the approximate posterior may typically multi-modal, and we derive a Sequential Monte Carlo sampler based on an annealing sequence in order to sample from that target distribution. Performance of the overall approach is evaluated on simulated and real datasets. In addition, for one real world dataset, we provide some numerical evidence that a Bayesian approach to semi-parametric estimation of spectral density may provide more reasonable results than its Frequentist counter-parts.

preprint2012arXiv

Expectation-Propagation for Likelihood-Free Inference

Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have been suggested, ABC inference is still far from routine. ABC is often excruciatingly slow due to very low acceptance rates. In addition, ABC requires introducing a vector of "summary statistics", the choice of which is relatively arbitrary, and often require some trial and error, making the whole process quite laborious for the user. We introduce in this work the EP-ABC algorithm, which is an adaptation to the likelihood-free context of the variational approximation algorithm known as Expectation Propagation (Minka, 2001). The main advantage of EP-ABC is that it is faster by a few orders of magnitude than standard algorithms, while producing an overall approximation error which is typically negligible. A second advantage of EP-ABC is that it replaces the usual global ABC constraint on the vector of summary statistics computed on the whole dataset, by n local constraints of the form that apply separately to each data-point. As a consequence, it is often possible to do away with summary statistics entirely. In that case, EP-ABC approximates directly the evidence (marginal likelihood) of the model. Comparisons are performed in three real-world applications which are typical of likelihood-free inference, including one application in neuroscience which is novel, and possibly too challenging for standard ABC techniques.

preprint2012arXiv

Fast simulation of truncated Gaussian distributions

We consider the problem of simulating a Gaussian vector X, conditional on the fact that each component of X belongs to a finite interval [a_i,b_i], or a semi-finite interval [a_i,+infty). In the one-dimensional case, we design a table-based algorithm that is computationally faster than alternative algorithms. In the two-dimensional case, we design an accept-reject algorithm. According to our calculations and our numerical studies, the acceptance rate of this algorithm is bounded from below by 0.5 for semi-finite truncation intervals, and by 0.47 for finite intervals. Extension to 3 or more dimensions is discussed.

preprint2012arXiv

In praise of the referee

There has been a lively debate in many fields, including statistics and related applied fields such as psychology and biomedical research, on possible reforms of the scholarly publishing system. Currently, referees contribute so much to improve scientific papers, both directly through constructive criticism and indirectly through the threat of rejection. We discuss ways in which new approaches to journal publication could continue to make use of the valuable efforts of peer reviewers.

preprint2012arXiv

On the equivalence between standard and sequentially ordered hidden Markov models

Chopin (2007) introduced a sequentially ordered hidden Markov model, for which states are ordered according to their order of appearance, and claimed that such a model is a re-parametrisation of a standard Markov model. This note gives a formal proof that this equivalence holds in Bayesian terms, as both formulations generate equivalent posterior distributions, but does not hold in Frequentist terms, as both formulations generate incompatible likelihood functions. Perhaps surprisingly, this shows that Bayesian re-parametrisation and Frequentist re-parametrisation are not identical concepts.

preprint2012arXiv

SMC^2: an efficient algorithm for sequential analysis of state-space models

We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iteratively these values using the likelihood increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles through a resampling step and a MCMC update step. In state-space models these likelihood increments are intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, for any fixed theta. This motivates the SMC^2 algorithm proposed in this article: a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the theta-particles target the correct posterior distribution at each iteration t, despite the intractability of the likelihood increments. We explore the applicability of our algorithm in both sequential and non-sequential applications and consider various degrees of freedom, as for example increasing dynamically the number of x-particles. We contrast our approach to various competing methods, both conceptually and empirically through a detailed simulation study, included here and in a supplement, and based on particularly challenging examples.

preprint2012arXiv

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors.

preprint2011arXiv

Free Energy Methods for Bayesian Inference: Efficient Exploration of Univariate Gaussian Mixture Posteriors

Because of their multimodality, mixture posterior distributions are difficult to sample with standard Markov chain Monte Carlo (MCMC) methods. We propose a strategy to enhance the sampling of MCMC in this context, using a biasing procedure which originates from computational Statistical Physics. The principle is first to choose a "reaction coordinate", that is, a "direction" in which the target distribution is multimodal. In a second step, the marginal log-density of the reaction coordinate with respect to the posterior distribution is estimated; minus this quantity is called "free energy" in the computational Statistical Physics literature. To this end, we use adaptive biasing Markov chain algorithms which adapt their targeted invariant distribution on the fly, in order to overcome sampling barriers along the chosen reaction coordinate. Finally, we perform an importance sampling step in order to remove the bias and recover the true posterior. The efficiency factor of the importance sampling step can easily be estimated \emph{a priori} once the bias is known, and appears to be rather large for the test cases we considered. A crucial point is the choice of the reaction coordinate. One standard choice (used for example in the classical Wang-Landau algorithm) is minus the log-posterior density. We discuss other choices. We show in particular that the hyper-parameter that determines the order of magnitude of the variance of each component is both a convenient and an efficient reaction coordinate. We also show how to adapt the method to compute the evidence (marginal likelihood) of a mixture model. We illustrate our approach by analyzing two real data sets.

preprint2011arXiv

Sequential Monte Carlo on large binary sampling spaces

A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric family for adaptive sampling on high-dimensional binary spaces. A practical motivation for this problem is variable selection in a linear regression context. We want to sample from a Bayesian posterior distribution on the model space using an appropriate version of Sequential Monte Carlo. Raw versions of Sequential Monte Carlo are easily implemented using binary vectors with independent components. For high-dimensional problems, however, these simple proposals do not yield satisfactory results. The key to an efficient adaptive algorithm are binary parametric families which take correlations into account, analogously to the multivariate normal distribution on continuous spaces. We provide a review of models for binary data and make one of them work in the context of Sequential Monte Carlo sampling. Computational studies on real life data with about a hundred covariates suggest that, on difficult instances, our Sequential Monte Carlo approach clearly outperforms standard techniques based on Markov chain exploration.

preprint2010arXiv

Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"

This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.

preprint2010arXiv

Free energy Sequential Monte Carlo, application to mixture modelling

We introduce a new class of Sequential Monte Carlo (SMC) methods, which we call free energy SMC. This class is inspired by free energy methods, which originate from Physics, and where one samples from a biased distribution such that a given function $ξ(θ)$ of the state $θ$ is forced to be uniformly distributed over a given interval. From an initial sequence of distributions $(π_t)$ of interest, and a particular choice of $ξ(θ)$, a free energy SMC sampler computes sequentially a sequence of biased distributions $(\tildeπ_{t})$ with the following properties: (a) the marginal distribution of $ξ(θ)$ with respect to $\tildeπ_{t}$ is approximatively uniform over a specified interval, and (b) $\tildeπ_{t}$ and $π_{t}$ have the same conditional distribution with respect to $ξ$. We apply our methodology to mixture posterior distributions, which are highly multimodal. In the mixture context, forcing certain hyper-parameters to higher values greatly faciliates mode swapping, and makes it possible to recover a symetric output. We illustrate our approach with univariate and bivariate Gaussian mixtures and two real-world datasets.

preprint2010arXiv

Harold Jeffreys's Theory of Probability Revisited

Published exactly seventy years ago, Jeffreys's Theory of Probability (1939) has had a unique impact on the Bayesian community and is now considered to be one of the main classics in Bayesian Statistics as well as the initiator of the objective Bayes school. In particular, its advances on the derivation of noninformative priors as well as on the scaling of Bayes factors have had a lasting impact on the field. However, the book reflects the characteristics of the time, especially in terms of mathematical rigor. In this paper we point out the fundamental aspects of this reference work, especially the thorough coverage of testing problems and the construction of both estimation and testing noninformative priors based on functional divergences. Our major aim here is to help modern readers in navigating in this difficult text and in concentrating on passages that are still relevant today.

preprint2010arXiv

On Particle Learning

This document is the aggregation of six discussions of Lopes et al. (2010) that we submitted to the proceedings of the Ninth Valencia Meeting, held in Benidorm, Spain, on June 3-8, 2010, in conjunction with Hedibert Lopes' talk at this meeting, and of a further discussion of the rejoinder by Lopes et al. (2010). The main point in those discussions is the potential for degeneracy in the particle learning methodology, related with the exponential forgetting of the past simulations. We illustrate in particular the resulting difficulties in the case of mixtures.

preprint2010arXiv

Rejoinder: Harold Jeffreys's Theory of Probability Revisited

We are grateful to all discussants of our re-visitation for their strong support in our enterprise and for their overall agreement with our perspective. Further discussions with them and other leading statisticians showed that the legacy of Theory of Probability is alive and lasting. [arXiv:0804.3173]

preprint2009arXiv

Properties of Nested Sampling

Nested sampling is a simulation method for approximating marginal likelihoods proposed by Skilling (2006). We establish that nested sampling has an approximation error that vanishes at the standard Monte Carlo rate and that this error is asymptotically Gaussian. We show that the asymptotic variance of the nested sampling approximation typically grows linearly with the dimension of the parameter. We discuss the applicability and efficiency of nested sampling in realistic problems, and we compare it with two current methods for computing marginal likelihood. We propose an extension that avoids resorting to Markov chain Monte Carlo to obtain the simulated points.

Nicolas Chopin

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

De-Sequentialized Monte Carlo: a parallel-in-time particle smoother

Improved Gibbs samplers for Cosmic Microwave Background power spectrum estimation

On resampling schemes for particle filters with weakly informative observations

Adaptive Tuning Of Hamiltonian Monte Carlo Within Sequential Monte Carlo

Metropolis-Hastings with Averaged Acceptance Ratios

Negative association, ordering and convergence of resampling methods

Control functionals for Monte Carlo integration

Application of Sequential Quasi-Monte Carlo to Autonomous Positioning

Convergence of Sequential Quasi-Monte Carlo Smoothing Algorithms

Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference

Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation

On particle Gibbs sampling

On Particle Methods for Parameter Estimation in State-Space Models

On the properties of variational approximations of Gibbs posteriors

Towards automatic calibration of the number of state particles within the SMC$^2$ algorithm

Bayesian matrix completion: prior specification

PAC-Bayesian AUC classification and scoring

Sequential Quasi-Monte Carlo

The Poisson transform for unnormalised statistical models

Perfect simulation for the Feynman-Kac law on the path space

Bayesian learning of noisy Markov decision processes

Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process

Computational aspects of Bayesian spectral density estimation

Expectation-Propagation for Likelihood-Free Inference

Fast simulation of truncated Gaussian distributions

In praise of the referee

On the equivalence between standard and sequentially ordered hidden Markov models

SMC^2: an efficient algorithm for sequential analysis of state-space models

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

Free Energy Methods for Bayesian Inference: Efficient Exploration of Univariate Gaussian Mixture Posteriors

Sequential Monte Carlo on large binary sampling spaces

Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"

Free energy Sequential Monte Carlo, application to mixture modelling

Harold Jeffreys's Theory of Probability Revisited

On Particle Learning

Rejoinder: Harold Jeffreys's Theory of Probability Revisited

Properties of Nested Sampling