Source author record

Chris Sherlock

Chris Sherlock appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Applications math.ST Statistics Theory Machine Learning math.PR stat.OT

Catalog footprint

What is connected

22works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Exact Bayesian inference for discretely observed Markov Jump Processes using finite rate matrices

We present new methodologies for Bayesian inference on the rate parameters of a discretely observed continuous-time Markov jump processes with a countably infinite state space. The usual method of choice for inference, particle Markov chain Monte Carlo (particle MCMC), struggles when the observation noise is small. We consider the most challenging regime of exact observations and provide two new methodologies for inference in this case: the minimal extended state space algorithm (MESA) and the nearly minimal extended state space algorithm (nMESA). By extending the Markov chain Monte Carlo state space, both MESA and nMESA use the exponentiation of finite rate matrices to perform exact Bayesian inference on the Markov jump process even though its state space is countably infinite. Numerical experiments show improvements over particle MCMC of between a factor of three and several orders of magnitude.

preprint2022arXiv

Hug and Hop: a discrete-time, non-reversible Markov chain Monte-Carlo algorithm

We introduced the Hug and Hop Markov chain Monte Carlo algorithm for estimating expectations with respect to an intractable distribution. The algorithm alternates between two kernels: Hug and Hop. Hug is a non-reversible kernel that repeatedly applies the bounce mechanism from the recently proposed Bouncy Particle Sampler to produce a proposal point far from the current position, yet on almost the same contour of the target density, leading to a high acceptance probability. Hug is complemented by Hop, which deliberately proposes jumps between contours and has an efficiency that degrades very slowly with increasing dimension. There are many parallels between Hug and Hamiltonian Monte Carlo using a leapfrog integrator, including the order of the integration scheme, however Hug is also able to make use of local Hessian information without requiring implicit numerical integration steps, and its performance is not terminally affected by unbounded gradients of the log-posterior. We test Hug and Hop empirically on a variety of toy targets and real statistical models and find that it can, and often does, outperform Hamiltonian Monte Carlo.

preprint2022arXiv

SwISS: A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets.

preprint2021arXiv

A Discrete Bouncy Particle Sampler

Most Markov chain Monte Carlo methods operate in discrete time and are reversible with respect to the target probability. Nevertheless, it is now understood that the use of non-reversible Markov chains can be beneficial in many contexts. In particular, the recently-proposed Bouncy Particle Sampler leverages a continuous-time and non-reversible Markov process and empirically shows state-of-the-art performances when used to explore certain probability densities; however, its implementation typically requires the computation of local upper bounds on the gradient of the log target density. We present the Discrete Bouncy Particle Sampler, a general algorithm based upon a guided random walk, a partial refreshment of direction, and a delayed-rejection step. We show that the Bouncy Particle Sampler can be understood as a scaling limit of a special case of our algorithm. In contrast to the Bouncy Particle Sampler, implementing the Discrete Bouncy Particle Sampler only requires point-wise evaluation of the target density and its gradient. We propose extensions of the basic algorithm for situations when the exact gradient of the target density is not available. In a Gaussian setting, we establish a scaling limit for the radial process as dimension increases to infinity. We leverage this result to obtain the theoretical efficiency of the Discrete Bouncy Particle Sampler as a function of the partial-refreshment parameter, which leads to a simple and robust tuning criterion. A further analysis in a more general setting suggests that this tuning criterion applies more generally. Theoretical and empirical efficiency curves are then compared for different targets and algorithm variations.

preprint2021arXiv

Efficiency of delayed-acceptance random walk Metropolis algorithms

Delayed-acceptance Metropolis-Hastings and delayed-acceptance pseudo-marginal Metropolis-Hastings algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased stochastic approximation thereof, but a computationally cheap deterministic approximation is available. An initial accept-reject stage uses the cheap approximation for computing the Metropolis-Hastings ratio; proposals which are accepted at this stage are then subjected to a further accept-reject step which corrects for the error in the approximation. Since the expensive posterior, or the approximation thereof, is only evaluated for proposals which are accepted at the first stage, the cost of the algorithm is reduced and larger scalings may be used. We focus on the random walk Metropolis (RWM) and consider the delayed-acceptance RWM and the delayed-acceptance pseudo-marginal RWM. We provide a framework for incorporating relatively general deterministic approximations into the theoretical analysis of high-dimensional targets. Justified by diffusion approximation arguments, we derive expressions for the limiting efficiency and acceptance rates in high-dimensional settings. These theoretical insights are finally leveraged to formulate practical guidelines for the efficient tuning of the algorithms. The robustness of these guidelines and predicted properties are verified against simulation studies, all of which are strictly outside of the domain of validity of our limit results.

preprint2021arXiv

Recruitment prediction for multi-centre clinical trials based on a hierarchical Poisson-gamma model: asymptotic analysis and improved intervals

We analyse predictions of future recruitment to a multi-centre clinical trial based on a maximum-likelihood fitting of a commonly used hierarchical Poisson-Gamma model for recruitments at individual centres. We consider the asymptotic accuracy of quantile predictions in the limit as the number of recruitment centres grows large and find that, in an important sense, the accuracy of the quantiles does not improve as the number of centres increases. When predicting the number of further recruits in an additional time period, the accuracy degrades as the ratio of the additional time to the census time increases, whereas when predicting the amount of additional time to recruit a further $n^+_\bullet$ patients, the accuracy degrades as the ratio of $n^+_\bullet$ to the number recruited up to the census period increases. Our analysis suggests an improved quantile predictor. Simulation studies verify that the predicted pattern holds for typical recruitment scenarios in clinical trials and verify the much improved coverage properties of prediction intervals obtained from our quantile predictor. In the process of extending the applicability of our methodology, we show that in terms of the accuracy of all integer moments it is always better to approximate the sum of independent gamma random variables by a single gamma random variable matched on the first two moments than by the moment-matched Gaussian available from the central limit theorem.

preprint2020arXiv

Direct statistical inference for finite Markov jump processes via the matrix exponential

Given noisy, partial observations of a time-homogeneous, finite-statespace Markov chain, conceptually simple, direct statistical inference is available, in theory, via its rate matrix, or infinitesimal generator, $\mathsf{Q}$, since $\exp (\mathsf{Q}t)$ is the transition matrix over time $t$. However, perhaps because of inadequate tools for matrix exponentiation in programming languages commonly used amongst statisticians or a belief that the necessary calculations are prohibitively expensive, statistical inference for continuous-time Markov chains with a large but finite state space is typically conducted via particle MCMC or other relatively complex inference schemes. When, as in many applications $\mathsf{Q}$ arises from a reaction network, it is usually sparse. We describe variations on known algorithms which allow fast, robust and accurate evaluation of the product of a non-negative vector with the exponential of a large, sparse rate matrix. Our implementation uses relatively recently developed, efficient, linear algebra tools that take advantage of such sparsity. We demonstrate the straightforward statistical application of the key algorithm on a model for the mixing of two alleles in a population and on the Susceptible-Infectious-Removed epidemic model.

preprint2020arXiv

Interim recruitment prediction for multi-centre clinical trials

We introduce a general framework for monitoring, modelling, and predicting the recruitment to multi-centre clinical trials. The work is motivated by overly optimistic and narrow prediction intervals produced by existing time-homogeneous recruitment models for multi-centre recruitment. We first present two tests for detection of decay in recruitment rates, together with a power study. We then introduce a model based on the inhomogeneous Poisson process with monotonically decaying intensity, motivated by recruitment trends observed in oncology trials. The general form of the model permits adaptation to any parametric curve-shape. A general method for constructing sensible parameter priors is provided and Bayesian model averaging is used for making predictions which account for the uncertainty in both the parameters and the model. The validity of the method and its robustness to misspecification are tested using simulated datasets. The new methodology is then applied to oncology trial data, where we make interim accrual predictions, comparing them to those obtained by existing methods, and indicate where unexpected changes in the accrual pattern occur.

preprint2016arXiv

Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods

When conducting Bayesian inference, delayed acceptance (DA) Metropolis-Hastings (MH) algorithms and DA pseudo-marginal MH algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased estimate thereof, but a computationally cheap approximation is available. A first accept-reject stage is applied, with the cheap approximation substituted for the true posterior in the MH acceptance ratio. Only for those proposals which pass through the first stage is the computationally expensive true posterior (or unbiased estimate thereof) evaluated, with a second accept-reject stage ensuring that detailed balance is satisfied with respect to the intended true posterior. In some scenarios there is no obvious computationally cheap approximation. A weighted average of previous evaluations of the computationally expensive posterior provides a generic approximation to the posterior. If only the $k$-nearest neighbours have non-zero weights then evaluation of the approximate posterior can be made computationally cheap provided that the points at which the posterior has been evaluated are stored in a multi-dimensional binary tree, known as a KD-tree. The contents of the KD-tree are potentially updated after every computationally intensive evaluation. The resulting adaptive, delayed-acceptance [pseudo-marginal] Metropolis-Hastings algorithm is justified both theoretically and empirically. Guidance on tuning parameters is provided and the methodology is applied to a discretely observed Markov jump process characterising predator-prey interactions and an ODE system describing the dynamics of an autoregulatory gene network.

preprint2016arXiv

Bayesian inference for diffusion driven mixed-effects models

Stochastic differential equations (SDEs) provide a natural framework for modelling intrinsic stochasticity inherent in many continuous-time physical processes. When such processes are observed in multiple individuals or experimental units, SDE driven mixed-effects models allow the quantification of between (as well as within) individual variation. Performing Bayesian inference for such models, using discrete time data that may be incomplete and subject to measurement error is a challenging problem and is the focus of this paper. We extend a recently proposed MCMC scheme to include the SDE driven mixed-effects framework. Fundamental to our approach is the development of a novel construct that allows for efficient sampling of conditioned SDEs that may exhibit nonlinear dynamics between observation times. We apply the resulting scheme to synthetic data generated from a simple SDE model of orange tree growth, and real data consisting of observations on aphid numbers recorded under a variety of different treatment regimes. In addition, we provide a systematic comparison of our approach with an inference scheme based on a tractable approximation of the SDE, that is, the linear noise approximation.

preprint2016arXiv

Improved bridge constructs for stochastic differential equations

We consider the task of generating discrete-time realisations of a nonlinear multivariate diffusion process satisfying an Itô stochastic differential equation conditional on an observation taken at a fixed future time-point. Such realisations are typically termed diffusion bridges. Since, in general, no closed form expression exists for the transition densities of the process of interest, a widely adopted solution works with the Euler-Maruyama approximation, by replacing the intractable transition densities with Gaussian approximations. However, the density of the conditioned discrete-time process remains intractable, necessitating the use of computationally intensive methods such as Markov chain Monte Carlo. Designing an efficient proposal mechanism which can be applied to a noisy and partially observed system that exhibits nonlinear dynamics is a challenging problem, and is the focus of this paper. By partitioning the process into two parts, one that accounts for nonlinear dynamics in a deterministic way, and another as a residual stochastic process, we develop a class of novel constructs that bridge the residual process via a linear approximation. In addition, we adapt a recently proposed construct to a partial and noisy observation regime. We compare the performance of each new construct with a number of existing approaches, using three applications.

preprint2016arXiv

Particle Metropolis-adjusted Langevin algorithms

This paper proposes a new sampling scheme based on Langevin dynamics that is applicable within pseudo-marginal and particle Markov chain Monte Carlo algorithms. We investigate this algorithm's theoretical properties under standard asymptotics, which correspond to an increasing dimension of the parameters, $n$. Our results show that the behaviour of the algorithm depends crucially on how accurately one can estimate the gradient of the log target density. If the error in the estimate of the gradient is not sufficiently controlled as dimension increases, then asymptotically there will be no advantage over the simpler random-walk algorithm. However, if the error is sufficiently well-behaved, then the optimal scaling of this algorithm will be $O(n^{-1/6})$ compared to $O(n^{-1/2})$ for the random walk. Our theory also gives guidelines on how to tune the number of Monte Carlo samples in the likelihood estimate and the proposal step-size.

preprint2016arXiv

Pseudo-marginal Metropolis--Hastings using averages of unbiased estimators

We consider a pseudo-marginal Metropolis--Hastings kernel $P_m$ that is constructed using an average of $m$ exchangeable random variables, as well as an analogous kernel $P_s$ that averages $s<m$ of these same random variables. Using an embedding technique to facilitate comparisons, we show that the asymptotic variances of ergodic averages associated with $P_m$ are lower bounded in terms of those associated with $P_s$. We show that the bound provided is tight and disprove a conjecture that when the random variables to be averaged are independent, the asymptotic variance under $P_m$ is never less than $s/m$ times the variance under $P_s$. The conjecture does, however, hold when considering continuous-time Markov chains. These results imply that if the computational cost of the algorithm is proportional to $m$, it is often better to set $m=1$. We provide intuition as to why these findings differ so markedly from recent results for pseudo-marginal kernels employing particle filter approximations. Our results are exemplified through two simulation studies; in the first the computational cost is effectively proportional to $m$ and in the second there is a considerable start-up cost at each iteration.

preprint2016arXiv

Residual-Bridge Constructs for Conditioned Diffusions

We introduce a new residual-bridge proposal for approximately simulating conditioned diffusions. This proposal is formed by applying the modified diffusion bridge approximation of Durham and Gallant (2002) to the difference between the true diffusion and a second, approximate diffusion driven by the same Brownian motion, and can be viewed as a natural extension to recent work on residual-bridge constructs (Whitaker et al., 2016). This new proposal attempts to account for volatilities which are not constant and can therefore lead to gains in efficiency over the recently proposed residual-bridge constructs in situations where the volatility varies considerably, as is often the case for larger inter-observation times and for time-inhomogeneous volatilities. These potential gains in efficiencies are illustrated via a simulation study.

preprint2015arXiv

Optimal scaling for the pseudo-marginal random walk Metropolis: insensitivity to the noise generating mechanism

We examine the optimal scaling and the efficiency of the pseudo-marginal random walk Metropolis algorithm using a recently-derived result on the limiting efficiency as the dimension, $d\rightarrow \infty$. We prove that the optimal scaling for a given target varies by less than $20\%$ across a wide range of distributions for the noise in the estimate of the target, and that any scaling that is within $20\%$ of the optimal one will be at least $70\%$ efficient. We demonstrate that this phenomenon occurs even outside the range of distributions for which we rigorously prove it. We then conduct a simulation study on an example with $d=10$ where importance sampling is used to estimate the target density; we also examine results available from an existing simulations study with $d=5$ and where a particle filter was used. Our key conclusions are found to hold in these examples also.

preprint2014arXiv

Bayesian Inference for Hybrid Discrete-Continuous Stochastic Kinetic Models

We consider the problem of efficiently performing simulation and inference for stochastic kinetic models. Whilst it is possible to work directly with the resulting Markov jump process, computational cost can be prohibitive for networks of realistic size and complexity. In this paper, we consider an inference scheme based on a novel hybrid simulator that classifies reactions as either "fast" or "slow" with fast reactions evolving as a continuous Markov process whilst the remaining slow reaction occurrences are modelled through a Markov jump process with time dependent hazards. A linear noise approximation (LNA) of fast reaction dynamics is employed and slow reaction events are captured by exploiting the ability to solve the stochastic differential equation driving the LNA. This simulation procedure is used as a proposal mechanism inside a particle MCMC scheme, thus allowing Bayesian inference for the model parameters. We apply the scheme to a simple application and compare the output with an existing hybrid approach and also a scheme for performing inference for the underlying discrete stochastic model.

preprint2014arXiv

Delayed acceptance particle MCMC for exact inference in stochastic kinetic models

Recently-proposed particle MCMC methods provide a flexible way of performing Bayesian inference for parameters governing stochastic kinetic models defined as Markov (jump) processes (MJPs). Each iteration of the scheme requires an estimate of the marginal likelihood calculated from the output of a sequential Monte Carlo scheme (also known as a particle filter). Consequently, the method can be extremely computationally intensive. We therefore aim to avoid most instances of the expensive likelihood calculation through use of a fast approximation. We consider two approximations: the chemical Langevin equation diffusion approximation (CLE) and the linear noise approximation (LNA). Either an estimate of the marginal likelihood under the CLE, or the tractable marginal likelihood under the LNA can be used to calculate a first step acceptance probability. Only if a proposal is accepted under the approximation do we then run a sequential Monte Carlo scheme to compute an estimate of the marginal likelihood under the true MJP and construct a second stage acceptance probability that permits exact (simulation based) inference for the MJP. We therefore avoid expensive calculations for proposals that are likely to be rejected. We illustrate the method by considering inference for parameters governing a Lotka-Volterra system, a model of gene expression and a simple epidemic process.

preprint2014arXiv

Inference for reaction networks using the Linear Noise Approximation

We consider inference for the reaction rates in discretely observed networks such as those found in models for systems biology, population ecology and epidemics. Most such networks are neither slow enough nor small enough for inference via the true state-dependent Markov jump process to be feasible. Typically, inference is conducted by approximating the dynamics through an ordinary differential equation (ODE), or a stochastic differential equation (SDE). The former ignores the stochasticity in the true model, and can lead to inaccurate inferences. The latter is more accurate but is harder to implement as the transition density of the SDE model is generally unknown. The Linear Noise Approximation (LNA) is a first order Taylor expansion of the approximating SDE about a deterministic solution and can be viewed as a compromise between the ODE and SDE models. It is a stochastic model, but discrete time transition probabilities for the LNA are available through the solution of a series of ordinary differential equations. We describe how a restarting LNA can be efficiently used to perform inference for a general class of reaction networks; evaluate the accuracy of such an approach; and show how and when this approach is either statistically or computationally more efficient than ODE or SDE methods. We apply the LNA to analyse Google Flu Trends data from the North and South Islands of New Zealand, and are able to obtain more accurate short-term forecasts of new flu cases than another recently proposed method, although at a greater computational cost.

preprint2014arXiv

On the efficiency of pseudo-marginal random walk Metropolis algorithms

We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approaches infinity, under the assumption that the noise in the estimate of the log-target is additive and is independent of the position. For targets with independent and identically distributed components, we also obtain a limiting diffusion for the first component. We then consider the overall efficiency of the algorithm, in terms of both speed of mixing and computational time. Assuming the additive noise is Gaussian and is inversely proportional to the number of unbiased estimates that are used, we prove that the algorithm is optimally efficient when the variance of the noise is approximately 3.283 and the acceptance rate is approximately 7.001%. We also find that the optimal scaling is insensitive to the noise and that the optimal variance of the noise is insensitive to the scaling. The theory is illustrated with a simulation study using the particle marginal random walk Metropolis.

preprint2013arXiv

A coupled hidden Markov model for disease interactions

To investigate interactions between parasite species in a host, a population of field voles was studied longitudinally, with presence or absence of six different parasites measured repeatedly. Although trapping sessions were regular, a different set of voles was caught at each session leading to incomplete profiles for all subjects. We use a discrete-time hidden Markov model for each disease with transition probabilities dependent on covariates via a set of logistic regressions. For each disease the hidden states for each of the other diseases at a given time point form part of the covariate set for the Markov transition probabilities from that time point. This allows us to gauge the influence of each parasite species on the transition probabilities for each of the other parasite species. Inference is performed via a Gibbs sampler, which cycles through each of the diseases, first using an adaptive Metropolis-Hastings step to sample from the conditional posterior of the covariate parameters for that particular disease given the hidden states for all other diseases and then sampling from the hidden states for that disease given the parameters. We find evidence for interactions between several pairs of parasites and of an acquired immune response for two of the parasites.

preprint2013arXiv

Langevin diffusions and the Metropolis-adjusted Langevin algorithm

We provide a clarification of the description of Langevin diffusions on Riemannian manifolds and of the measure underlying the invariant density. As a result we propose a new position-dependent Metropolis-adjusted Langevin algorithm (MALA) based upon a Langevin diffusion in $\mathbb{R}^d$ which has the required invariant density with respect to Lebesgue measure. We show that our diffusion and the diffusion upon which a previously-proposed position-dependent MALA is based are equivalent in some cases but are distinct in general. A simulation study illustrates the gain in efficiency provided by the new position-dependent MALA.

preprint2010arXiv

The Random Walk Metropolis: Linking Theory and Practice Through a Case Study

The random walk Metropolis (RWM) is one of the most common Markov chain Monte Carlo algorithms in practical use today. Its theoretical properties have been extensively explored for certain classes of target, and a number of results with important practical implications have been derived. This article draws together a selection of new and existing key results and concepts and describes their implications. The impact of each new idea on algorithm efficiency is demonstrated for the practical example of the Markov modulated Poisson process (MMPP). A reparameterization of the MMPP which leads to a highly efficient RWM-within-Gibbs algorithm in certain circumstances is also presented.

Chris Sherlock

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Exact Bayesian inference for discretely observed Markov Jump Processes using finite rate matrices

Hug and Hop: a discrete-time, non-reversible Markov chain Monte-Carlo algorithm

SwISS: A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

A Discrete Bouncy Particle Sampler

Efficiency of delayed-acceptance random walk Metropolis algorithms

Recruitment prediction for multi-centre clinical trials based on a hierarchical Poisson-gamma model: asymptotic analysis and improved intervals

Direct statistical inference for finite Markov jump processes via the matrix exponential

Interim recruitment prediction for multi-centre clinical trials

Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods

Bayesian inference for diffusion driven mixed-effects models

Improved bridge constructs for stochastic differential equations

Particle Metropolis-adjusted Langevin algorithms

Pseudo-marginal Metropolis--Hastings using averages of unbiased estimators

Residual-Bridge Constructs for Conditioned Diffusions

Optimal scaling for the pseudo-marginal random walk Metropolis: insensitivity to the noise generating mechanism

Bayesian Inference for Hybrid Discrete-Continuous Stochastic Kinetic Models

Delayed acceptance particle MCMC for exact inference in stochastic kinetic models

Inference for reaction networks using the Linear Noise Approximation

On the efficiency of pseudo-marginal random walk Metropolis algorithms

A coupled hidden Markov model for disease interactions

Langevin diffusions and the Metropolis-adjusted Langevin algorithm

The Random Walk Metropolis: Linking Theory and Practice Through a Case Study