Source author record

Daniel Williamson

Daniel Williamson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Computation Applications Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Deep Gaussian Process Emulation using Stochastic Imputation

Deep Gaussian processes (DGPs) provide a rich class of models that can better represent functions with varying regimes or sharp changes, compared to conventional GPs. In this work, we propose a novel inference method for DGPs for computer model emulation. By stochastically imputing the latent layers, our approach transforms a DGP into a linked GP: a novel emulator developed for systems of linked computer models. This transformation permits an efficient DGP training procedure that only involves optimizations of conventional GPs. In addition, predictions from DGP emulators can be made in a fast and analytically tractable manner by naturally utilizing the closed form predictive means and variances of linked GP emulators. We demonstrate the method in a series of synthetic examples and empirical applications, and show that it is a competitive candidate for DGP surrogate inference, combining efficiency that is comparable to doubly stochastic variational inference and uncertainty quantification that is comparable to the fully-Bayesian approach. A $\texttt{Python}$ package $\texttt{dgpsi}$ implementing the method is also produced and available at https://github.com/mingdeyu/DGP.

preprint2021arXiv

Cross-validation based adaptive sampling for Gaussian process models

In many real-world applications, we are interested in approximating black-box, costly functions as accurately as possible with the smallest number of function evaluations. A complex computer code is an example of such a function. In this work, a Gaussian process (GP) emulator is used to approximate the output of complex computer code. We consider the problem of extending an initial experiment (set of model runs) sequentially to improve the emulator. A sequential sampling approach based on leave-one-out (LOO) cross-validation is proposed that can be easily extended to a batch mode. This is a desirable property since it saves the user time when parallel computing is available. After fitting a GP to training data points, the expected squared LOO (ES-LOO) error is calculated at each design point. ES-LOO is used as a measure to identify important data points. More precisely, when this quantity is large at a point it means that the quality of prediction depends a great deal on that point and adding more samples nearby could improve the accuracy of the GP. As a result, it is reasonable to select the next sample where ES-LOO is maximised. However, ES-LOO is only known at the experimental design and needs to be estimated at unobserved points. To do this, a second GP is fitted to the ES-LOO errors and where the maximum of the modified expected improvement (EI) criterion occurs is chosen as the next sample. EI is a popular acquisition function in Bayesian optimisation and is used to trade-off between local/global search. However, it has a tendency towards exploitation, meaning that its maximum is close to the (current) "best" sample. To avoid clustering, a modified version of EI, called pseudo expected improvement, is employed which is more explorative than EI yet allows us to discover unexplored regions. Our results show that the proposed sampling method is promising.

preprint2020arXiv

Classification of Computer Models with Labelled Outputs

Classification is a vital tool that is important for modelling many complex numerical models. A model or system may be such that, for certain areas of input space, the output either does not exist, or is not in a quantifiable form. Here, we present a new method for classification where the model outputs are given distinct classifying labels, which we model using a latent Gaussian process (GP). The latent variable is estimated using MCMC sampling, a unique likelihood and distinct prior specifications. Our classifier is then verified by calculating a misclassification rate across the input space. Comparisons are made with other existing classification methods including logistic regression, which models the probability of being classified into one of two regions. To make classification predictions we draw from an independent Bernoulli distribution, meaning that distance correlation is lost from the independent draws and so can result in many misclassifications. By modelling the labels using a latent GP, this problem does not occur in our method. We apply our novel method to a range of examples including a motivating example which models the hormones associated with the reproductive system in mammals, where the two labelled outputs are high and low rates of reproduction.

preprint2015arXiv

Posterior Belief Assessment: Extracting Meaningful Subjective Judgements from Bayesian Analyses with Complex Statistical Models

In this paper, we are concerned with attributing meaning to the results of a Bayesian analysis for a problem which is sufficiently complex that we are unable to assert a precise correspondence between the expert probabilistic judgements of the analyst and the particular forms chosen for the prior specification and the likelihood for the analysis. In order to do this, we propose performing a finite collection of additional Bayesian analyses under alternative collections of prior and likelihood modelling judgements that we may also view as representative of our prior knowledge and the problem structure, and use these to compute posterior belief assessments for key quantities of interest. We show that these assessments are closer to our true underlying beliefs than the original Bayesian analysis and use the temporal sure preference principle to establish a probabilistic relationship between our true posterior judgements, our posterior belief assessment and our original Bayesian analysis to make this precise. We exploit second order exchangeability in order to generalise our approach to situations where there are infinitely many alternative Bayesian analyses we might consider as informative for our true judgements so that the method remains tractable even in these cases. We argue that posterior belief assessment is a tractable and powerful alternative to robust Bayesian analysis. We describe a methodology for computing posterior belief assessments in even the most complex of statistical models and illustrate with an example of calibrating an expensive ocean model in order to quantify uncertainty about global mean temperature in the real ocean.

preprint2013arXiv

Efficient uniform designs for multi-wave computer experiments

In this paper we tackle the problem of generating uniform designs in very small subregions of computer model input space that have been identified in previous experiments as worthy of further study. The method is capable of producing uniform designs in subregions of computer model input space defined by a membership function that consists of a continuous function passing a threshold test, and does so far more efficiently than current methods when these subregions are small. Our application is designing for regions of input space that are not ruled out by history matching, a statistical methodology applied in numerous diverse scientific applications whereby model runs are used to cut out regions of input space that are incompatible with real world observations. History matching defines a membership function for a region of input space that is not ruled out yet by observations in the form of a distance metric called implausibility. We use this distance metric to drive a new type of Evolutionary Monte Carlo algorithm with a uniform distribution on the not ruled out yet region as its target distribution. The algorithm can locate and generate uniform points within extremely small subspaces of the computer model input space with complex and even disconnected topologies. We illustrate the performance of the technique in comparison to current methods with a number of idealised examples. We then apply our algorithm to generating an optimal design for the not ruled out yet region of a galaxy simulation model called GALFORM following 4 previous waves of history matching where the target region is 0.001% the volume of the input space.