Source author record

Christopher Drovandi

Christopher Drovandi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Methodology Applications Machine Learning econ.EM Populations and Evolution Quantitative Methods

Catalog footprint

What is connected

21works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bayesian score calibration for approximate models

Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with a surrogate, but this can result in a posterior with poor uncertainty quantification. In this paper, we propose a new method for adjusting approximate posterior samples to reduce bias and improve posterior coverage properties. We do this by optimizing a transformation of the approximate posterior, the result of which maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We develop supporting theory for our method and demonstrate beneficial corrections to approximate posteriors across several examples of increasing complexity.

preprint2023arXiv

Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics

Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products.

preprint2022arXiv

A Comparison of Likelihood-Free Methods With and Without Summary Statistics

Likelihood-free methods are useful for parameter estimation of complex models with intractable likelihood functions for which it is easy to simulate data. Such models are prevalent in many disciplines including genetics, biology, ecology and cosmology. Likelihood-free methods avoid explicit likelihood evaluation by finding parameter values of the model that generate data close to the observed data. The general consensus has been that it is most efficient to compare datasets on the basis of a low dimensional informative summary statistic, incurring information loss in favour of reduced dimensionality. More recently, researchers have explored various approaches for efficiently comparing empirical distributions in the likelihood-free context in an effort to avoid data summarisation. This article provides a review of these full data distance based approaches, and conducts the first comprehensive comparison of such methods, both qualitatively and empirically. We also conduct a substantive empirical comparison with summary statistic based likelihood-free methods. The discussion and results offer guidance to practitioners considering a likelihood-free approach. Whilst we find the best approach to be problem dependent, we also find that the full data distance based approaches are promising and warrant further development. We discuss some opportunities for future research in this space. Computer code to implement the methods discussed in this paper can be found at https://github.com/cdrovandi/ABC-dist-compare.

preprint2022arXiv

Component-wise iterative ensemble Kalman inversion for static Bayesian models with unknown measurement error covariance

The ensemble Kalman filter (EnKF) is a Monte Carlo approximation of the Kalman filter for high dimensional linear Gaussian state space models. EnKF methods have also been developed for parameter inference of static Bayesian models with a Gaussian likelihood, in a way that is analogous to likelihood tempering sequential Monte Carlo (SMC). These methods are commonly referred to as ensemble Kalman inversion (EKI). Unlike SMC, the inference from EKI is only asymptotically unbiased if the likelihood is linear Gaussian and the priors are Gaussian. However, EKI is significantly faster to run. Currently, a large limitation of EKI methods is that the covariance of the measurement error is assumed to be fully known. We develop a new method, which we call component-wise iterative ensemble Kalman inversion (CW-IEKI), that allows elements of the covariance matrix to be inferred alongside the model parameters at negligible extra cost. This novel method is compared to SMC on three different application examples: a model of nitrogen mineralisation in soil that is based on the Agricultural Production Systems Simulator (APSIM), a model predicting seagrass decline due to stress from water temperature and light, and a model predicting coral calcification rates. On all of these examples, we find that CW-IEKI has relatively similar predictive performance to SMC, albeit with greater uncertainty, and it has a significantly faster run time.

preprint2022arXiv

Efficient inference and identifiability analysis for differential equation models with random parameters

Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.

preprint2022arXiv

Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation

Likelihood-free methods are an essential tool for performing inference for implicit models which can be simulated from, but for which the corresponding likelihood is intractable. However, common likelihood-free methods do not scale well to a large number of model parameters. A promising approach to high-dimensional likelihood-free inference involves estimating low-dimensional marginal posteriors by conditioning only on summary statistics believed to be informative for the low-dimensional component, and then combining the low-dimensional approximations in some way. In this paper, we demonstrate that such low-dimensional approximations can be surprisingly poor in practice for seemingly intuitive summary statistic choices. We describe an idealized low-dimensional summary statistic that is, in principle, suitable for marginal estimation. However, a direct approximation of the idealized choice is difficult in practice. We thus suggest an alternative approach to marginal estimation which is easier to implement and automate. Given an initial choice of low-dimensional summary statistic that might only be informative about a marginal posterior location, the new method improves performance by first crudely localising the posterior approximation using all the summary statistics to ensure global identifiability, followed by a second step that hones in on an accurate low-dimensional approximation using the low-dimensional summary statistic. We show that the posterior this approach targets can be represented as a logarithmic pool of posterior distributions based on the low-dimensional and full summary statistics, respectively. The good performance of our method is illustrated in several examples.

preprint2022arXiv

Modularized Bayesian analyses and cutting feedback in likelihood-free inference

There has been much recent interest in modifying Bayesian inference for misspecified models so that it is useful for specific purposes. One popular modified Bayesian inference method is "cutting feedback" which can be used when the model consists of a number of coupled modules, with only some of the modules being misspecified. Cutting feedback methods represent the full posterior distribution in terms of conditional and sequential components, and then modify some terms in such a representation based on the modular structure for specification or computation of a modified posterior distribution. The main goal of this is to avoid contamination of inferences for parameters of interest by misspecified modules. Computation for cut posterior distributions is challenging, and here we consider cutting feedback for likelihood-free inference based on Gaussian mixture approximations to the joint distribution of parameters and data summary statistics. We exploit the fact that marginal and conditional distributions of a Gaussian mixture are Gaussian mixtures to give explicit approximations to marginal or conditional posterior distributions so that we can easily approximate cut posterior analyses. The mixture approach allows repeated approximation of posterior distributions for different data based on a single mixture fit, which is important for model checks which aid in the decision of whether to "cut". A semi-modular approach to likelihood-free inference where feedback is partially cut is also developed. The benefits of the method are illustrated in two challenging examples, a collective cell spreading model and a continuous time model for asset returns with jumps.

preprint2022arXiv

Monte Carlo twisting for particle filters

We consider the problem of designing efficient particle filters for twisted Feynman--Kac models. Particle filters using twisted models can deliver low error approximations of statistical quantities and such twisting functions can be learnt iteratively. Practical implementations of these algorithms are complicated by the need to (i) sample from the twisted transition dynamics, and (ii) calculate the twisted potential functions. We expand the class of applicable models using rejection sampling for (i) and unbiased approximations for (ii) using a random weight particle filter. We characterise the average acceptance rates within the particle filter in order to control the computational cost, and analyse the asymptotic variance. Empirical results show the mean squared error of the normalising constant estimate in our method is smaller than a memory-equivalent particle filter but not a computation-equivalent filter. Both comparisons are improved when more efficient sampling is possible which we demonstrate on a stochastic volatility model.

preprint2022arXiv

Optimal Bayesian design for model discrimination via classification

Performing optimal Bayesian design for discriminating between competing models is computationally intensive as it involves estimating posterior model probabilities for thousands of simulated datasets. This issue is compounded further when the likelihood functions for the rival models are computationally expensive. A new approach using supervised classification methods is developed to perform Bayesian optimal model discrimination design. This approach requires considerably fewer simulations from the candidate models than previous approaches using approximate Bayesian computation. Further, it is easy to assess the performance of the optimal design through the misclassification error rate. The approach is particularly useful in the presence of models with intractable likelihoods but can also provide computational advantages when the likelihoods are manageable.

preprint2022arXiv

Population Calibration using Likelihood-Free Bayesian Inference

In this paper we develop a likelihood-free approach for population calibration, which involves finding distributions of model parameters when fed through the model produces a set of outputs that matches available population data. Unlike most other approaches to population calibration, our method produces uncertainty quantification on the estimated distribution. Furthermore, the method can be applied to any population calibration problem, regardless of whether the model of interest is deterministic or stochastic, or whether the population data is observed with or without measurement error. We demonstrate the method on several examples, including one with real data. We also discuss the computational limitations of the approach. Immediate applications for the methodology developed here exist in many areas of medical research including cancer, COVID-19, drug development and cardiology.

preprint2022arXiv

Regularized Zero-Variance Control Variates

Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the number of covariates in the regression rapidly increases with the dimension of the target. In this paper, we present compelling empirical evidence that the use of penalized regression techniques in the selection of high-dimensional control variates provides performance gains over the classical least squares method. Another type of regularization based on using subsets of derivatives, or a priori regularization as we refer to it in this paper, is also proposed to reduce computational and storage requirements. Several examples showing the utility and limitations of regularized ZV-CV for Bayesian inference are given. The methods proposed in this paper are accessible through the R package ZVCV.

preprint2022arXiv

The effect of biologically mediated decay rates on modelling soil carbon sequestration in agricultural settings

Microbial biomass carbon (MBC), a crucial soil labile carbon fraction, is the most active component of the soil organic carbon (SOC) that regulates bio-geochemical processes in terrestrial ecosystems. Some studies in the literature ignore the effect of microbial population growth on carbon decomposition rates. In reality, we might expect that the decomposition rate should be related to the population of microbes in the soil and have a positive relationship with the size of the microbial biomass pool. In this study, we explore the effect of microbial population growth on the accuracy of modelling soil carbon sequestration by developing and comparing two soil carbon models that consider a carrying capacity and limit to the growth of the microbial pool. We apply our models to three datasets, two small and one large datasets, and we select the best model in terms of having the best predictive performance through two model selection methods. Through this analysis we reveal that commonly used complex soil carbon models can over-fit in the presence of both small and large time-series datasets, and our simpler model can produce more accurate predictions. We conclude that considering the microbial population growth in a soil carbon model improves the accuracy of a model in the presence of a large dataset.

preprint2021arXiv

Accelerating sequential Monte Carlo with surrogate likelihoods

Delayed-acceptance is a technique for reducing computational effort for Bayesian models with expensive likelihoods. Using a delayed-acceptance kernel for Markov chain Monte Carlo can reduce the number of expensive likelihoods evaluations required to approximate a posterior expectation. Delayed-acceptance uses a surrogate, or approximate, likelihood to avoid evaluation of the expensive likelihood when possible. Within the sequential Monte Carlo framework, we utilise the history of the sampler to adaptively tune the surrogate likelihood to yield better approximations of the expensive likelihood, and use a surrogate first annealing schedule to further increase computational efficiency. Moreover, we propose a framework for optimising computation time whilst avoiding particle degeneracy, which encapsulates existing strategies in the literature. Overall, we develop a novel algorithm for computationally efficient SMC with expensive likelihood functions. The method is applied to static Bayesian models, which we demonstrate on toy and real examples, code for which is available at https://github.com/bonStats/smcdar.

preprint2020arXiv

Efficient Bayesian estimation for GARCH-type models via Sequential Monte Carlo

The advantages of sequential Monte Carlo (SMC) are exploited to develop parameter estimation and model selection methods for GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) style models. It provides an alternative method for quantifying estimation uncertainty relative to classical inference. Even with long time series, it is demonstrated that the posterior distribution of model parameters are non-normal, highlighting the need for a Bayesian approach and an efficient posterior sampling method. Efficient approaches for both constructing the sequence of distributions in SMC, and leave-one-out cross-validation, for long time series data are also proposed. Finally, an unbiased estimator of the likelihood is developed for the Bad Environment-Good Environment model, a complex GARCH-type model, which permits exact Bayesian inference not previously available in the literature.

preprint2020arXiv

Efficient Bayesian synthetic likelihood with whitening transformations

Likelihood-free methods are an established approach for performing approximate Bayesian inference for models with intractable likelihood functions. However, they can be computationally demanding. Bayesian synthetic likelihood (BSL) is a popular such method that approximates the likelihood function of the summary statistic with a known, tractable distribution -- typically Gaussian -- and then performs statistical inference using standard likelihood-based techniques. However, as the number of summary statistics grows, the number of model simulations required to accurately estimate the covariance matrix for this likelihood rapidly increases. This poses significant challenge for the application of BSL, especially in cases where model simulation is expensive. In this article we propose whitening BSL (wBSL) -- an efficient BSL method that uses approximate whitening transformations to decorrelate the summary statistics at each algorithm iteration. We show empirically that this can reduce the number of model simulations required to implement BSL by more than an order of magnitude, without much loss of accuracy. We explore a range of whitening procedures and demonstrate the performance of wBSL on a range of simulated and real modelling scenarios from ecology and biology.

preprint2020arXiv

Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation

The Banana Bunchy Top Virus (BBTV) is one of the most economically important vector-borne banana diseases throughout the Asia-Pacific Basin and presents a significant challenge to the agricultural sector. Current models of BBTV are largely deterministic, limited by an incomplete understanding of interactions in complex natural systems, and the appropriate identification of parameters. A stochastic network-based Susceptible-Infected model has been created which simulates the spread of BBTV across the subsections of a banana plantation, parameterising nodal recovery, neighbouring and distant infectivity across summer and winter. Findings from posterior results achieved through Markov Chain Monte Carlo approach to approximate Bayesian computation suggest seasonality in all parameters, which are influenced by correlated changes in inspection accuracy, temperatures and aphid activity. This paper demonstrates how the model may be used for monitoring and forecasting of various disease management strategies to support policy-level decision making.

preprint2020arXiv

Robust Approximate Bayesian Computation: An Adjustment Approach

We propose a novel approach to approximate Bayesian computation (ABC) that seeks to cater for possible misspecification of the assumed model. This new approach can be equally applied to rejection-based ABC and to popular regression adjustment ABC. We demonstrate that this new approach mitigates the poor performance of regression adjusted ABC that can eventuate when the model is misspecified. In addition, this new adjustment approach allows us to detect which features of the observed data can not be reliably reproduced by the assumed model. A series of simulated and empirical examples illustrate this new approach.

preprint2020arXiv

Robust Approximate Bayesian Inference with Synthetic Likelihood

Bayesian synthetic likelihood (BSL) is now an established method for conducting approximate Bayesian inference in models where, due to the intractability of the likelihood function, exact Bayesian approaches are either infeasible or computationally too demanding. Implicit in the application of BSL is the assumption that the data generating process (DGP) can produce simulated summary statistics that capture the behaviour of the observed summary statistics. We demonstrate that if this compatibility between the actual and assumed DGP is not satisfied, i.e., if the model is misspecified, BSL can yield unreliable parameter inference. To circumvent this issue, we propose a new BSL approach that can detect the presence of model misspecification, and simultaneously deliver useful inferences even under significant model misspecification. Two simulated and two real data examples demonstrate the performance of this new approach to BSL, and document its superior accuracy over standard BSL when the assumed model is misspecified.

preprint2020arXiv

Sequential Bayesian Experimental Design for Implicit Models via Mutual Information

Bayesian experimental design (BED) is a framework that uses statistical models and decision making under uncertainty to optimise the cost and performance of a scientific experiment. Sequential BED, as opposed to static BED, considers the scenario where we can sequentially update our beliefs about the model parameters through data gathered in the experiment. A class of models of particular interest for the natural and medical sciences are implicit models, where the data generating distribution is intractable, but sampling from it is possible. Even though there has been a lot of work on static BED for implicit models in the past few years, the notoriously difficult problem of sequential BED for implicit models has barely been touched upon. We address this gap in the literature by devising a novel sequential design framework for parameter estimation that uses the Mutual Information (MI) between model parameters and simulated data as a utility function to find optimal experimental designs, which has not been done before for implicit models. Our approach uses likelihood-free inference by ratio estimation to simultaneously estimate posterior distributions and the MI. During the sequential BED procedure we utilise Bayesian optimisation to help us optimise the MI utility. We find that our framework is efficient for the various implicit models tested, yielding accurate parameter estimates after only a few iterations.

preprint2020arXiv

Sequential Experimental Design for Predator-Prey Functional Response Experiments

Understanding functional response within a predator-prey dynamic is a cornerstone for many quantitative ecological studies. Over the past 60 years, the methodology for modelling functional response has gradually transitioned from the classic mechanistic models to more statistically oriented models. To obtain inferences on these statistical models, a substantial number of experiments need to be conducted. The obvious disadvantages of collecting this volume of data include cost, time and the sacrificing of animals. Therefore, optimally designed experiments are useful as they may reduce the total number of experimental runs required to attain the same statistical results. In this paper, we develop the first sequential experimental design method for predator-prey functional response experiments. To make inferences on the parameters in each of the statistical models we consider, we use sequential Monte Carlo, which is computationally efficient and facilitates convenient estimation of important utility functions. It provides coverage of experimental goals including parameter estimation, model discrimination as well as a combination of these. The results of our simulation study illustrate that for predator-prey functional response experiments sequential design outperforms static design for our experimental goals. R code for implementing the methodology is available via https://github.com/haydenmoffat/sequential_design_for_predator_prey_experiments.

preprint2020arXiv

Transformations in Semi-Parametric Bayesian Synthetic Likelihood

Bayesian synthetic likelihood (BSL) is a popular method for performing approximate Bayesian inference when the likelihood function is intractable. In synthetic likelihood methods, the likelihood function is approximated parametrically via model simulations, and then standard likelihood-based techniques are used to perform inference. The Gaussian synthetic likelihood estimator has become ubiquitous in BSL literature, primarily for its simplicity and ease of implementation. However, it is often too restrictive and may lead to poor posterior approximations. Recently, a more flexible semi-parametric Bayesian synthetic likelihood (semiBSL) estimator has been introduced, which is significantly more robust to irregularly distributed summary statistics. In this work, we propose a number of extensions to semiBSL. First, we consider even more flexible estimators of the marginal distributions using transformation kernel density estimation. Second, we propose whitening semiBSL (wsemiBSL) -- a method to significantly improve the computational efficiency of semiBSL. wsemiBSL uses an approximate whitening transformation to decorrelate summary statistics at each algorithm iteration. The methods developed herein significantly improve the versatility and efficiency of BSL algorithms.

Christopher Drovandi

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Bayesian score calibration for approximate models

Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics

A Comparison of Likelihood-Free Methods With and Without Summary Statistics

Component-wise iterative ensemble Kalman inversion for static Bayesian models with unknown measurement error covariance

Efficient inference and identifiability analysis for differential equation models with random parameters

Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation

Modularized Bayesian analyses and cutting feedback in likelihood-free inference

Monte Carlo twisting for particle filters

Optimal Bayesian design for model discrimination via classification

Population Calibration using Likelihood-Free Bayesian Inference

Regularized Zero-Variance Control Variates

The effect of biologically mediated decay rates on modelling soil carbon sequestration in agricultural settings

Accelerating sequential Monte Carlo with surrogate likelihoods

Efficient Bayesian estimation for GARCH-type models via Sequential Monte Carlo

Efficient Bayesian synthetic likelihood with whitening transformations

Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation

Robust Approximate Bayesian Computation: An Adjustment Approach

Robust Approximate Bayesian Inference with Synthetic Likelihood

Sequential Bayesian Experimental Design for Implicit Models via Mutual Information

Sequential Experimental Design for Predator-Prey Functional Response Experiments

Transformations in Semi-Parametric Bayesian Synthetic Likelihood