Researcher profile

Radford M. Neal

Radford M. Neal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2020arXiv

Non-reversibly updating a uniform [0,1] value for Metropolis accept/reject decisions

I show how it can be beneficial to express Metropolis accept/reject decisions in terms of comparison with a uniform [0,1] value, u, and to then update u non-reversibly, as part of the Markov chain state, rather than sampling it independently each iteration. This provides a small improvement for random walk Metropolis and Langevin updates in high dimensions. It produces a larger improvement when using Langevin updates with persistent momentum, giving performance comparable to that of Hamiltonian Monte Carlo (HMC) with long trajectories. This is of significance when some variables are updated by other methods, since if HMC is used, these updates can be done only between trajectories, whereas they can be done more often with Langevin updates. I demonstrate that for a problem with some continuous variables, updated by HMC or Langevin updates, and also discrete variables, updated by Gibbs sampling between updates of the continuous variables, Langevin with persistent momentum and non-reversible updates to u samples nearly a factor of two more efficiently than HMC. Benefits are also seen for a Bayesian neural network model in which hyperparameters are updated by Gibbs sampling.

preprint2013arXiv

Inference for Belief Networks Using Coupling From the Past

Inference for belief networks using Gibbs sampling produces a distribution for unobserved variables that differs from the correct distribution by a (usually) unknown error, since convergence to the right distribution occurs only asymptotically. The method of "coupling from the past" samples from exactly the correct distribution by (conceptually) running dependent Gibbs sampling simulations from every possible starting state from a time far enough in the past that all runs reach the same state at time t=0. Explicitly considering every possible state is intractable for large networks, however. We propose a method for layered noisy-or networks that uses a compact, but often imprecise, summary of a set of states. This method samples from exactly the correct distribution, and requires only about twice the time per step as ordinary Gibbs sampling, but it may require more simulation steps than would be needed if chains were tracked exactly.

preprint2013arXiv

MCMC for non-linear state space models using ensembles of latent sequences

Non-linear state space models are a widely-used class of models for biological, economic, and physical processes. Fitting these models to observed data is a difficult inference problem that has no straightforward solution. We take a Bayesian approach to the inference of unknown parameters of a non-linear state model; this, in turn, requires the availability of efficient Markov Chain Monte Carlo (MCMC) sampling methods for the latent (hidden) variables and model parameters. Using the ensemble technique of Neal (2010) and the embedded HMM technique of Neal (2003), we introduce a new Markov Chain Monte Carlo method for non-linear state space models. The key idea is to perform parameter updates conditional on an enormously large ensemble of latent sequences, as opposed to a single sequence, as with existing methods. We look at the performance of this ensemble method when doing Bayesian inference in the Ricker model of population dynamics. We show that for this problem, the ensemble method is vastly more efficient than a simple Metropolis method, as well as 1.9 to 12.0 times more efficient than a single-sequence embedded HMM method, when all methods are tuned appropriately. We also introduce a way of speeding up the ensemble method by performing partial backward passes to discard poor proposals at low computational cost, resulting in a final efficiency gain of 3.4 to 20.4 times over the single-sequence method.

preprint2013arXiv

MCMC methods for Gaussian process models using fast approximations for the likelihood

Gaussian Process (GP) models are a powerful and flexible tool for non-parametric regression and classification. Computation for GP models is intensive, since computing the posterior density, $π$, for covariance function parameters requires computation of the covariance matrix, C, a $pn^2$ operation, where p is the number of covariates and n is the number of training cases, and then inversion of C, an $n^3$ operation. We introduce MCMC methods based on the "temporary mapping and caching" framework, using a fast approximation, $π^*$, as the distribution needed to construct the temporary space. We propose two implementations under this scheme: "mapping to a discretizing chain", and "mapping with tempered transitions", both of which are exactly correct MCMC methods for sampling $π$, even though their transitions are constructed using an approximation. These methods are equivalent when their tuning parameters are set at the simplest values, but differ in general. We compare how well these methods work when using several approximations, finding on synthetic datasets that a $π^*$ based on the "Subset of Data" (SOD) method is almost always more efficient than standard MCMC using only $π$. On some datasets, a more sophisticated $π^*$ based on the "Nyström-Cholesky" method works better than SOD.

preprint2012arXiv

Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals

Gaussian Process (GP) regression models typically assume that residuals are Gaussian and have the same variance for all observations. However, applications with input-dependent noise (heteroscedastic residuals) frequently arise in practice, as do applications in which the residuals do not have a Gaussian distribution. In this paper, we propose a GP Regression model with a latent variable that serves as an additional unobserved covariate for the regression. This model (which we call GPLC) allows for heteroscedasticity since it allows the function to have a changing partial derivative with respect to this unobserved covariate. With a suitable covariance function, our GPLC model can handle (a) Gaussian residuals with input-dependent variance, or (b) non-Gaussian residuals with input-dependent variance, or (c) Gaussian residuals with constant variance. We compare our model, using synthetic datasets, with a model proposed by Goldberg, Williams and Bishop (1998), which we refer to as GPLV, which only deals with case (a), as well as a standard GP model which can handle only case (c). Markov Chain Monte Carlo methods are developed for both modelsl. Experiments show that when the data is heteroscedastic, both GPLC and GPLV give better results (smaller mean squared error and negative log-probability density) than standard GP regression. In addition, when the residual are Gaussian, our GPLC model is generally nearly as good as GPLV, while when the residuals are non-Gaussian, our GPLC model is better than GPLV.

preprint2012arXiv

How to view an MCMC simulation as a permutation, with applications to parallel simulation and improved importance sampling

Consider a Markov chain defined on a finite state space, X, that leaves invariant the uniform distribution on X, and whose transition probabilities are integer multiples of 1/Q, for some integer Q. I show how a simulation of n transitions of this chain starting at x_0 can be viewed as applying a random permutation on the space XxU, where U={0,1,...,Q-1}, to the start state (x_0,u_0), with u_0 drawn uniformly from U. This result can be applied to a non-uniform distribution with probabilities that are integer multiples of 1/P, for some integer P, by representing it as the marginal distribution for X from the uniform distribution on a suitably-defined subset of XxY, where Y={0,1,...,P-1}. By letting Q, P, and the cardinality of X go to infinity, this result can be generalized to non-rational probabilities and to continuous state spaces, with permutations on a finite space replaced by volume-preserving one-to-one maps from a continuous space to itself. These constructions can be efficiently implemented for chains commonly used in Markov chain Monte Carlo (MCMC) simulations. I present two applications in this context - simulation of K realizations of a chain from K initial states, but with transitions defined by a single stream of random numbers, as may be efficient with a vector processor or multiple processors, and use of MCMC to improve an importance sampling distribution that already has substantial overlap with the distribution of interest. I also discuss the implications of this "permutation MCMC" method regarding the role of randomness in MCMC simulation, and the potential use of non-random and quasi-random numbers.

preprint2012arXiv

Split Hamiltonian Monte Carlo

We show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up by "splitting" the Hamiltonian in a way that allows much of the movement around the state space to be done at low computational cost. One context where this is possible is when the log density of the distribution of interest (the potential energy function) can be written as the log of a Gaussian density, which is a quadratic function, plus a slowly varying function. Hamiltonian dynamics for quadratic energy functions can be analytically solved. With the splitting technique, only the slowly-varying part of the energy needs to be handled numerically, and this can be done with a larger stepsize (and hence fewer steps) than would be necessary with a direct simulation of the dynamics. Another context where splitting helps is when the most important terms of the potential energy function and its gradient can be evaluated quickly, with only a slowly-varying part requiring costly computations. With splitting, the quick portion can be handled with a small stepsize, while the costly portion uses a larger stepsize. We show that both of these splitting approaches can reduce the computational cost of sampling from the posterior distribution for a logistic regression model, using either a Gaussian approximation centered on the posterior mode, or a Hamiltonian split into a term that depends on only a small number of critical cases, and another term that involves the larger number of cases whose influence on the posterior distribution is small. Supplemental materials for this paper are available online.

preprint2011arXiv

MCMC Using Ensembles of States for Problems with Fast and Slow Variables such as Gaussian Process Regression

I introduce a Markov chain Monte Carlo (MCMC) scheme in which sampling from a distribution with density pi(x) is done using updates operating on an "ensemble" of states. The current state x is first stochastically mapped to an ensemble, x^{(1)},...,x^{(K)}. This ensemble is then updated using MCMC updates that leave invariant a suitable ensemble density, rho(x^{(1)},...,x^{(K)}), defined in terms of pi(x^{(i)}) for i=1,...,K. Finally a single state is stochastically selected from the ensemble after these updates. Such ensemble MCMC updates can be useful when characteristics of pi and the ensemble permit pi(x^{(i)}) for all i in {1,...,K}, to be computed in less than K times the amount of computation time needed to compute pi(x) for a single x. One common situation of this type is when changes to some "fast" variables allow for quick re-computation of the density, whereas changes to other "slow" variables do not. Gaussian process regression models are an example of this sort of problem, with an overall scaling factor for covariances and the noise variance being fast variables. I show that ensemble MCMC for Gaussian process regression models can indeed substantially improve sampling performance. Finally, I discuss other possible applications of ensemble MCMC, and its relationship to the "multiple-try Metropolis" method of Liu, Liang, and Wong and the "multiset sampler" of Leman, Chen, and Lavine.

preprint2010arXiv

Covariance-Adaptive Slice Sampling

We describe two slice sampling methods for taking multivariate steps using the crumb framework. These methods use the gradients at rejected proposals to adapt to the local curvature of the log-density surface, a technique that can produce much better proposals when parameters are highly correlated. We evaluate our methods on four distributions and compare their performance to that of a non-adaptive slice sampling method and a Metropolis method. The adaptive methods perform favorably on low-dimensional target distributions with highly-correlated parameters.

preprint2010arXiv

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

The shrinking rank method is a variation of slice sampling that is efficient at sampling from multivariate distributions with highly correlated parameters. It requires that the gradient of the log-density be computable. At each individual step, it approximates the current slice with a Gaussian occupying a shrinking-dimension subspace. The dimension of the approximation is shrunk orthogonally to the gradient at rejected proposals, since the gradients at points outside the current slice tend to point towards the slice. This causes the proposal distribution to converge rapidly to an estimate of the longest axis of the slice, resulting in states that are less correlated than those generated by related methods. After describing the method, we compare it to two other methods on several distributions and obtain favorable results.