Source author record

Andrew M. Stuart

Andrew M. Stuart appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA math.PR math.ST Statistics Theory Numerical Analysis Machine Learning Computation math.OC Methodology math.DS Applications Artificial Intelligence math.AP math.SP physics.comp-ph

Catalog footprint

What is connected

40works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Amortized Energy-Based Bayesian Inference

We consider amortized Bayesian inference for nonlinear inverse problems in settings where only samples from the joint distribution of parameters and observations are available. Classical methods such as Markov chain Monte Carlo require solving a new inference problem for each observation, which can be computationally prohibitive when inference must be repeated many times. We propose a transport-based approach that learns an observation-dependent map pushing forward a reference measure to approximate the posterior distribution. The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward. This formulation is likelihood-free, requiring only joint samples, and avoids density evaluation, invertibility constraints, and Jacobian determinant computations. For function-space inverse problems with Gaussian priors, we parameterize the transport map as the identity plus a perturbation in the Cameron-Martin space of the prior, preserving absolute continuity with respect to the prior. In infinite-dimensional settings, the map is represented using neural operators. We illustrate the method on a finite-dimensional nonlinear inverse problem and two PDE-constrained inverse problems arising in porous medium flow and seismic inversion. The results show that the learned transport captures posterior structure, including multimodality and dominant modes, while enabling fast posterior sampling for new observations.

preprint2024arXiv

Learning Homogenization for Elliptic Operators

Multiscale partial differential equations (PDEs) arise in various applications, and several schemes have been developed to solve them efficiently. Homogenization theory is a powerful methodology that eliminates the small-scale dependence, resulting in simplified equations that are computationally tractable while accurately predicting the macroscopic response. In the field of continuum mechanics, homogenization is crucial for deriving constitutive laws that incorporate microscale physics in order to formulate balance laws for the macroscopic quantities of interest. However, obtaining homogenized constitutive laws is often challenging as they do not in general have an analytic form and can exhibit phenomena not present on the microscale. In response, data-driven learning of the constitutive law has been proposed as appropriate for this task. However, a major challenge in data-driven learning approaches for this problem has remained unexplored: the impact of discontinuities and corner interfaces in the underlying material. These discontinuities in the coefficients affect the smoothness of the solutions of the underlying equations. Given the prevalence of discontinuous materials in continuum mechanics applications, it is important to address the challenge of learning in this context; in particular, to develop underpinning theory that establishes the reliability of data-driven methods in this scientific domain. The paper addresses this unexplored challenge by investigating the learnability of homogenized constitutive laws for elliptic operators in the presence of such complexities. Approximation theory is presented, and numerical experiments are performed which validate the theory in the context of learning the solution operator defined by the cell problem arising in homogenization for elliptic PDEs.

preprint2022arXiv

A Framework for Machine Learning of Model Error in Dynamical Systems

The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.

preprint2022arXiv

Efficient Derivative-free Bayesian Inference for Large-Scale Inverse Problems

We consider Bayesian inference for large scale inverse problems, where computational challenges arise from the need for repeated evaluations of an expensive forward model. This renders most Markov chain Monte Carlo approaches infeasible, since they typically require $O(10^4)$ model runs, or more. Moreover, the forward model is often given as a black box or is impractical to differentiate. Therefore derivative-free algorithms are highly desirable. We propose a framework, which is built on Kalman methodology, to efficiently perform Bayesian inference in such inverse problems. The basic method is based on an approximation of the filtering distribution of a novel mean-field dynamical system into which the inverse problem is embedded as an observation operator. Theoretical properties of the mean-field model are established for linear inverse problems, demonstrating that the desired Bayesian posterior is given by the steady state of the law of the filtering distribution of the mean-field dynamical system, and proving exponential convergence to it. This suggests that, for nonlinear problems which are close to Gaussian, sequentially computing this law provides the basis for efficient iterative methods to approximate the Bayesian posterior. Ensemble methods are applied to obtain interacting particle system approximations of the filtering distribution of the mean-field model; and practical strategies to further reduce the computational and memory cost of the methodology are presented, including low-rank approximation and a bi-fidelity approach. The effectiveness of the framework is demonstrated in several numerical experiments, including proof-of-concept linear/nonlinear examples and two large-scale applications: learning of permeability parameters in subsurface flow; and learning subgrid-scale parameters in a global climate model from time-averaged statistics.

preprint2022arXiv

Ensemble Inference Methods for Models With Noisy and Expensive Likelihoods

The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large scale parametric structure of interest. {A motivating example from climate science is presented, and ensemble Kalman methods (which do not use the derivative of the parameter-to-data map) are shown, empirically, to perform well. Multiscale analysis is then used to analyze the behaviour of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods} (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favourably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, whilst ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favourably.

preprint2022arXiv

Iterated Kalman Methodology For Inverse Problems

This paper is focused on the optimization approach to the solution of inverse problems. We introduce a stochastic dynamical system in which the parameter-to-data map is embedded, with the goal of employing techniques from nonlinear Kalman filtering to estimate the parameter given the data. The extended Kalman filter (which we refer to as ExKI in the context of inverse problems) can be effective for some inverse problems approached this way, but is impractical when the forward map is not readily differentiable and is given as a black box, and also for high dimensional parameter spaces because of the need to propagate large covariance matrices. Application of ensemble Kalman filters, for example use of the ensemble Kalman inversion (EKI) algorithm, has emerged as a useful tool which overcomes both of these issues: it is derivative free and works with a low-rank covariance approximation formed from the ensemble. In this paper, we work with the ExKI, EKI, and a variant on EKI which we term unscented Kalman inversion (UKI). The paper contains two main contributions. Firstly, we identify a novel stochastic dynamical system in which the parameter-to-data map is embedded. We present theory in the linear case to show exponential convergence of the mean of the filtering distribution to the solution of a regularized least squares problem. This is in contrast to previous work in which the EKI has been employed where the dynamical system used leads to algebraic convergence to an unregularized problem. Secondly, we show that the application of the UKI to this novel stochastic dynamical system yields improved inversion results, in comparison with the application of EKI to the same novel stochastic dynamical system.

preprint2022arXiv

Learning Markovian Homogenized Models in Viscoelasticity

Fully resolving dynamics of materials with rapidly-varying features involves expensive fine-scale computations which need to be conducted on macroscopic scales. The theory of homogenization provides an approach to derive effective macroscopic equations which eliminates the small scales by exploiting scale separation. An accurate homogenized model avoids the computationally-expensive task of numerically solving the underlying balance laws at a fine scale, thereby rendering a numerical solution of the balance laws more computationally tractable. In complex settings, homogenization only defines the constitutive model implicitly, and machine learning can be used to learn the constitutive model explicitly from localized fine-scale simulations. In the case of one-dimensional viscoelasticity, the linearity of the model allows for a complete analysis. We establish that the homogenized constitutive model may be approximated by a recurrent neural network (RNN) that captures the memory. The memory is encapsulated in the evolution of an appropriate finite set of internal variables, discovered through the learning process and dependent on the history of the strain. Simulations are presented which validate the theory. Guidance for the learning of more complex models, such as arise in plasticity, by similar techniques, is given.

preprint2022arXiv

The Cost-Accuracy Trade-Off In Operator Learning With Neural Networks

The term `surrogate modeling' in computational science and engineering refers to the development of computationally efficient approximations for expensive simulations, such as those arising from numerical solution of partial differential equations (PDEs). Surrogate modeling is an enabling methodology for many-query computations in science and engineering, which include iterative methods in optimization and sampling methods in uncertainty quantification. Over the last few years, several approaches to surrogate modeling for PDEs using neural networks have emerged, motivated by successes in using neural networks to approximate nonlinear maps in other areas. In principle, the relative merits of these different approaches can be evaluated by understanding, for each one, the cost required to achieve a given level of accuracy. However, the absence of a complete theory of approximation error for these approaches makes it difficult to assess this cost-accuracy trade-off. The purpose of the paper is to provide a careful numerical study of this issue, comparing a variety of different neural network architectures for operator approximation across a range of problems arising from PDE models in continuum mechanics.

preprint2021arXiv

The Random Feature Model for Input-Output Maps between Banach Spaces

Well known to the machine learning community, the random feature model is a parametric approximation to kernel interpolation or regression methods. It is typically used to approximate functions mapping a finite-dimensional input space to the real line. In this paper, we instead propose a methodology for use of the random feature model as a data-driven surrogate for operators that map an input Banach space to an output Banach space. Although the methodology is quite general, we consider operators defined by partial differential equations (PDEs); here, the inputs and outputs are themselves functions, with the input parameters being functions required to specify the problem, such as initial data or coefficients, and the outputs being solutions of the problem. Upon discretization, the model inherits several desirable attributes from this infinite-dimensional viewpoint, including mesh-invariant approximation error with respect to the true PDE solution map and the capability to be trained at one mesh resolution and then deployed at different mesh resolutions. We view the random feature model as a non-intrusive data-driven emulator, provide a mathematical framework for its interpretation, and demonstrate its ability to efficiently and accurately approximate the nonlinear parameter-to-solution maps of two prototypical PDEs arising in physical science and engineering applications: viscous Burgers' equation and a variable coefficient elliptic equation.

preprint2020arXiv

Consistency of semi-supervised learning algorithms on graphs: Probit and one-hot methods

Graph-based semi-supervised learning is the problem of propagating labels from a small number of labelled data points to a larger set of unlabelled data. This paper is concerned with the consistency of optimization-based techniques for such problems, in the limit where the labels have small noise and the underlying unlabelled data is well clustered. We study graph-based probit for binary classification, and a natural generalization of this method to multi-class classification using one-hot encoding. The resulting objective function to be optimized comprises the sum of a quadratic form defined through a rational function of the graph Laplacian, involving only the unlabelled data, and a fidelity term involving only the labelled data. The consistency analysis sheds light on the choice of the rational function defining the optimization.

preprint2020arXiv

Reconciling Bayesian and perimeter regularization for binary inversion

A central theme in classical algorithms for the reconstruction of discontinuous functions from observational data is perimeter regularization via the use of the total variation. On the other hand, sparse or noisy data often demands a probabilistic approach to the reconstruction of images, to enable uncertainty quantification; the Bayesian approach to inversion, which itself introduces a form of regularization, is a natural framework in which to carry this out. In this paper the link between Bayesian inversion methods and perimeter regularization is explored. In this paper two links are studied: (i) the maximum a posteriori (MAP) objective function of a suitably chosen Bayesian phase-field approach is shown to be closely related to a least squares plus perimeter regularization objective; (ii) sample paths of a suitably chosen Bayesian level set formulation are shown to possess finite perimeter and to have the ability to learn about the true perimeter.

preprint2020arXiv

Spectral Analysis Of Weighted Laplacians Arising In Data Clustering

Graph Laplacians computed from weighted adjacency matrices are widely used to identify geometric structure in data, and clusters in particular; their spectral properties play a central role in a number of unsupervised and semi-supervised learning algorithms. When suitably scaled, graph Laplacians approach limiting continuum operators in the large data limit. Studying these limiting operators, therefore, sheds light on learning algorithms. This paper is devoted to the study of a parameterized family of divergence form elliptic operators that arise as the large data limit of graph Laplacians. The link between a three-parameter family of graph Laplacians and a three-parameter family of differential operators is explained. The spectral properties of these differential operators are analyzed in the situation where the data comprises two nearly separated clusters, in a sense which is made precise. In particular, we investigate how the spectral gap depends on the three parameters entering the graph Laplacian, and on a parameter measuring the size of the perturbation from the perfectly clustered case. Numerical results are presented which exemplify and extend the analysis: the computations study situations in which there are two nearly separated clusters, but which violate the assumptions used in our theory; situations in which more than two clusters are present, also going beyond our theory; and situations which demonstrate the relevance of our studies of differential operators for the understanding of finite data problems via the graph Laplacian. The findings provide insight into parameter choices made in learning algorithms which are based on weighted adjacency matrices; they also provide the basis for analysis of the consistency of various unsupervised and semi-supervised learning algorithms, in the large data limit.

preprint2016arXiv

Analysis of the ensemble Kalman filter for inverse problems

The ensemble Kalman filter (EnKF) is a widely used methodology for state estimation in partial, noisily observed dynamical systems, and for parameter estimation in inverse problems. Despite its widespread use in the geophysical sciences, and its gradual adoption in many other areas of application, analysis of the method is in its infancy. Furthermore, much of the existing analysis deals with the large ensemble limit, far from the regime in which the method is typically used. The goal of this paper is to analyze the method when applied to inverse problems with fixed ensemble size. A continuous-time limit is derived and the long-time behavior of the resulting dynamical system is studied. Most of the rigorous analysis is confined to the linear forward problem, where we demonstrate that the continuous time limit of the EnKF corresponds to a set of gradient flows for the data misfit in each ensemble member, coupled through a common pre-conditioner which is the empirical covariance matrix of the ensemble. Numerical results demonstrate that the conclusions of the analysis extend beyond the linear inverse problem setting. Numerical experiments are also given which demonstrate the benefits of various extensions of the basic methodology.

preprint2016arXiv

Gaussian Approximations of Small Noise Diffusions in Kullback-Leibler Divergence

We study Gaussian approximations to the distribution of a diffusion. The approximations are easy to compute: they are defined by two simple ordinary differential equations for the mean and the covariance. Time correlations can also be computed via solution of a linear stochastic differential equation. We show, using the Kullback-Leibler divergence, that the approximations are accurate in the small noise regime. An analogous discrete time setting is also studied. The results provide both theoretical support for the use of Gaussian processes in the approximation of diffusions, and methodological guidance in the construction of Gaussian approximations in applications.

preprint2016arXiv

Hierarchical Bayesian Level Set Inversion

The level set approach has proven widely successful in the study of inverse problems for interfaces, since its systematic development in the 1990s. Recently it has been employed in the context of Bayesian inversion, allowing for the quantification of uncertainty within the reconstruction of interfaces. However the Bayesian approach is very sensitive to the length and amplitude scales in the prior probabilistic model. This paper demonstrates how the scale-sensitivity can be circumvented by means of a hierarchical approach, using a single scalar parameter. Together with careful consideration of the development of algorithms which encode probability measure equivalences as the hierarchical parameter is varied, this leads to well-defined Gibbs based MCMC methods found by alternating Metropolis-Hastings updates of the level set function and the hierarchical parameter. These methods demonstrably outperform non-hierarchical Bayesian level set methods.

preprint2016arXiv

MAP Estimators for Piecewise Continuous Inversion

We study the inverse problem of estimating a field $u$ from data comprising a finite set of nonlinear functionals of $u$, subject to additive noise; we denote this observed data by $y$. Our interest is in the reconstruction of piecewise continuous fields in which the discontinuity set is described by a finite number of geometric parameters. Natural applications include groundwater flow and electrical impedance tomography. We take a Bayesian approach, placing a prior distribution on $u$ and determining the conditional distribution on $u$ given the data $y$. It is then natural to study maximum a posterior (MAP) estimators. Recently (Dashti et al 2013) it has been shown that MAP estimators can be characterised as minimisers of a generalised Onsager-Machlup functional, in the case where the prior measure is a Gaussian random field. We extend this theory to a more general class of prior distributions which allows for piecewise continuous fields. Specifically, the prior field is assumed to be piecewise Gaussian with random interfaces between the different Gaussians defined by a finite number of parameters. We also make connections with recent work on MAP estimators for linear problems and possibly non-Gaussian priors (Helin, Burger 2015) which employs the notion of Fomin derivative. In showing applicability of our theory we focus on the groundwater flow and EIT models, though the theory holds more generally. Numerical experiments are implemented for the groundwater flow model, demonstrating the feasibility of determining MAP estimators for these piecewise continuous models, but also that the geometric formulation can lead to multiple nearby (local) MAP estimators. We relate these MAP estimators to the behaviour of output from MCMC samples of the posterior, obtained using a state-of-the-art function space Metropolis-Hastings method.

preprint2016arXiv

The Bayesian Formulation of EIT: Analysis and Algorithms

We provide a rigorous Bayesian formulation of the EIT problem in an infinite dimensional setting, leading to well-posedness in the Hellinger metric with respect to the data. We focus particularly on the reconstruction of binary fields where the interface between different media is the primary unknown. We consider three different prior models - log-Gaussian, star-shaped and level set. Numerical simulations based on the implementation of MCMC are performed, illustrating the advantages and disadvantages of each type of prior in the reconstruction, in the case where the true conductivity is a binary field, and exhibiting the properties of the resulting posterior distribution.

preprint2016arXiv

Weak error estimates for trajectories of SPDEs for Spectral Galerkin discretization

We consider stochastic semi-linear evolution equations which are driven by additive, spatially correlated, Wiener noise, and in particular consider problems of heat equation (analytic semigroup) and damped-driven wave equations (bounded semigroup) type. We discretize these equations by means of a spectral Galerkin projection, and we study the approximation of the probability distribution of the trajectories: test functions are regular, but depend on the values of the process on the interval $[0,T]$. We introduce a new approach in the context of quantative weak error analysis for discretization of SPDEs. The weak error is formulated using a deterministic function (Itô map) of the stochastic convolution found when the nonlinear term is dropped. The regularity properties of the Itô map are exploited, and in particular second-order Taylor expansions employed, to transfer the error from spectral approximation of the stochastic convolution into the weak error of interest. We prove that the weak rate of convergence is twice the strong rate of convergence in two situations. First, we assume that the covariance operator commutes with the generator of the semigroup: the first order term in the weak error expansion cancels out thanks to an independence property. Second, we remove the commuting assumption, and extend the previous result, thanks to the analysis of a new error term depending on a commutator.

preprint2015arXiv

A Bayesian Level Set Method for Geometric Inverse Problems

We introduce a level set based approach to Bayesian geometric inverse problems. In these problems the interface between different domains is the key unknown, and is realized as the level set of a function. This function itself becomes the object of the inference. Whilst the level set methodology has been widely used for the solution of geometric inverse problems, the Bayesian formulation that we develop here contains two significant advances: firstly it leads to a well-posed inverse problem in which the posterior distribution is Lipschitz with respect to the observed data; and secondly it leads to computationally expedient algorithms in which the level set itself is updated implicitly via the MCMC methodology applied to the level set function- no explicit velocity field is required for the level set interface. Applications are numerous and include medical imaging, modelling of subsurface formations and the inverse source problem; our theory is illustrated with computational results involving the last two applications.

preprint2015arXiv

Filter Based Methods For Statistical Linear Inverse Problems

Ill-posed inverse problems are ubiquitous in applications. Under- standing of algorithms for their solution has been greatly enhanced by a deep understanding of the linear inverse problem. In the applied communities ensemble-based filtering methods have recently been used to solve inverse problems by introducing an artificial dynamical sys- tem. This opens up the possibility of using a range of other filtering methods, such as 3DVAR and Kalman based methods, to solve inverse problems, again by introducing an artificial dynamical system. The aim of this paper is to analyze such methods in the context of the ill-posed linear inverse problem. Statistical linear inverse problems are studied in the sense that the observational noise is assumed to be derived via realization of a Gaussian random variable. We investigate the asymptotic behavior of filter based methods for these inverse problems. Rigorous convergence rates are established for 3DVAR and for the Kalman filters, including minimax rates in some instances. Blowup of 3DVAR and a variant of its basic form is also presented, and optimality of the Kalman filter is discussed. These analyses reveal a close connection between (iterative) regularization schemes in deterministic inverse problems and filter based methods in data assimilation. Numerical experiments are presented to illustrate the theory.

preprint2015arXiv

The Bayesian Approach To Inverse Problems

These lecture notes highlight the mathematical and computational structure relating to the formulation of, and development of algorithms for, the Bayesian approach to inverse problems in differential equations. This approach is fundamental in the quantification of uncertainty within applications involving the blending of mathematical models with data.

preprint2014arXiv

A Function Space HMC Algorithm With Second Order Langevin Diffusion Limit

We describe a new MCMC method optimized for the sampling of probability measures on Hilbert space which have a density with respect to a Gaussian; such measures arise in the Bayesian approach to inverse problems, and in conditioned diffusions. Our algorithm is based on two key design principles: (i) algorithms which are well-defined in infinite dimensions result in methods which do not suffer from the curse of dimensionality when they are applied to approximations of the infinite dimensional target measure on $\bbR^N$; (ii) non-reversible algorithms can have better mixing properties compared to their reversible counterparts. The method we introduce is based on the hybrid Monte Carlo algorithm, tailored to incorporate these two design principles. The main result of this paper states that the new algorithm, appropriately rescaled, converges weakly to a second order Langevin diffusion on Hilbert space; as a consequence the algorithm explores the approximate target measures on $\bbR^N$ in a number of steps which is independent of $N$. We also present the underlying theory for the limiting non-reversible diffusion on Hilbert space, including characterization of the invariant measure, and we describe numerical simulations demonstrating that the proposed method has favourable mixing properties as an MCMC algorithm.

preprint2014arXiv

Algorithms for Kullback-Leibler Approximation of Probability Measures in Infinite Dimensions

In this paper we study algorithms to find a Gaussian approximation to a target measure defined on a Hilbert space of functions; the target measure itself is defined via its density with respect to a reference Gaussian measure. We employ the Kullback-Leibler divergence as a distance and find the best Gaussian approximation by minimizing this distance. It then follows that the approximate Gaussian must be equivalent to the Gaussian reference measure, defining a natural function space setting for the underlying calculus of variations problem. We introduce a computational algorithm which is well-adapted to the required minimization, seeking to find the mean as a function, and parameterizing the covariance in two different ways: through low rank perturbations of the reference covariance; and through Schrödinger potential perturbations of the inverse reference covariance. Two applications are shown: to a nonlinear inverse problem in elliptic PDEs, and to a conditioned diffusion process. We also show how the Gaussian approximations we obtain may be used to produce improved pCN-MCMC methods which are not only well-adapted to the high-dimensional setting, but also behave well with respect to small observational noise (resp. small temperatures) in the inverse problem (resp. conditioned diffusion).

preprint2014arXiv

Analysis of the Gibbs sampler for hierarchical inverse problems

Many inverse problems arising in applications come from continuum models where the unknown parameter is a field. In practice the unknown field is discretized resulting in a problem in $\mathbb{R}^N$, with an understanding that refining the discretization, that is increasing $N$, will often be desirable. In the context of Bayesian inversion this situation suggests the importance of two issues: (i) defining hyper-parameters in such a way that they are interpretable in the continuum limit $N \to \infty$ and so that their values may be compared between different discretization levels; (ii) understanding the efficiency of algorithms for probing the posterior distribution, as a function of large $N.$ Here we address these two issues in the context of linear inverse problems subject to additive Gaussian noise within a hierarchical modelling framework based on a Gaussian prior for the unknown field and an inverse-gamma prior for a hyper-parameter, namely the amplitude of the prior variance. The structure of the model is such that the Gibbs sampler can be easily implemented for probing the posterior distribution. Subscribing to the dogma that one should think infinite-dimensionally before implementing in finite dimensions, we present function space intuition and provide rigorous theory showing that as $N$ increases, the component of the Gibbs sampler for sampling the amplitude of the prior variance becomes increasingly slower. We discuss a reparametrization of the prior variance that is robust with respect to the increase in dimension; we give numerical experiments which exhibit that our reparametrization prevents the slowing down. Our intuition on the behaviour of the prior hyper-parameter, with and without reparametrization, is sufficiently general to include a broad class of nonlinear inverse problems as well as other families of hyper-priors.

preprint2014arXiv

Determining White Noise Forcing From Eulerian Observations in the Navier Stokes Equation

The Bayesian approach to inverse problems is of paramount importance in quantifying uncertainty about the input to and the state of a system of interest given noisy observations. Herein we consider the forward problem of the forced 2D Navier Stokes equation. The inverse problem is inference of the forcing, and possibly the initial condition, given noisy observations of the velocity field. We place a prior on the forcing which is in the form of a spatially correlated temporally white Gaussian process, and formulate the inverse problem for the posterior distribution. Given appropriate spatial regularity conditions, we show that the solution is a continuous function of the forcing. Hence, for appropriately chosen spatial regularity in the prior, the posterior distribution on the forcing is absolutely continuous with respect to the prior and is hence well-defined. Furthermore, the posterior distribution is a continuous function of the data. We complement this theoretical result with numerical simulation of the posterior distribution.

preprint2014arXiv

Gradient Flow from a Random Walk in Hilbert Space

Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metropolis-Hastings accept-reject mechanism to an Ornstein-Uhlenbeck proposal which is itself reversible with respect to the underlying Gaussian measure. The resulting noisy gradient flow is a stochastic partial differential equation driven by a Wiener process with spatial correlation given by the underlying Gaussian structure.

preprint2014arXiv

Spectral gaps for a Metropolis-Hastings algorithm in infinite dimensions

We study the problem of sampling high and infinite dimensional target measures arising in applications such as conditioned diffusions and inverse problems. We focus on those that arise from approximating measures on Hilbert spaces defined via a density with respect to a Gaussian reference measure. We consider the Metropolis-Hastings algorithm that adds an accept-reject mechanism to a Markov chain proposal in order to make the chain reversible with respect to the target measure. We focus on cases where the proposal is either a Gaussian random walk (RWM) with covariance equal to that of the reference measure or an Ornstein-Uhlenbeck proposal (pCN) for which the reference measure is invariant. Previous results in terms of scaling and diffusion limits suggested that the pCN has a convergence rate that is independent of the dimension while the RWM method has undesirable dimension-dependent behaviour. We confirm this claim by exhibiting a dimension-independent Wasserstein spectral gap for pCN algorithm for a large class of target measures. In our setting this Wasserstein spectral gap implies an $L^2$-spectral gap. We use both spectral gaps to show that the ergodic average satisfies a strong law of large numbers, the central limit theorem and nonasymptotic bounds on the mean square error, all dimension independent. In contrast we show that the spectral gap of the RWM algorithm applied to the reference measures degenerates as the dimension tends to infinity.

preprint2014arXiv

Well-Posed Bayesian Geometric Inverse Problems Arising in Subsurface Flow

In this paper, we consider the inverse problem of determining the permeability of the subsurface from hydraulic head measurements, within the framework of a steady Darcy model of groundwater flow. We study geometrically defined prior permeability fields, which admit layered, fault and channel structures, in order to mimic realistic subsurface features; within each layer we adopt either constant or continuous function representation of the permeability. This prior model leads to a parameter identification problem for a finite number of unknown parameters determining the geometry, together with either a finite number of permeability values (in the constant case) or a finite number of fields (in the continuous function case). We adopt a Bayesian framework showing existence and well-posedness of the posterior distribution. We also introduce novel Markov Chain-Monte Carlo (MCMC) methods, which exploit the different character of the geometric and permeability parameters, and build on recent advances in function space MCMC. These algorithms provide rigorous estimates of the permeability, as well as the uncertainty associated with it, and only require forward model evaluations. No adjoint solvers are required and hence the methodology is applicable to black-box forward models. We then use these methods to explore the posterior and to illustrate the methodology with numerical experiments.

preprint2013arXiv

Bayesian Posterior Contraction Rates for Linear Severely Ill-posed Inverse Problems

We consider a class of linear ill-posed inverse problems arising from inversion of a compact operator with singular values which decay exponentially to zero. We adopt a Bayesian approach, assuming a Gaussian prior on the unknown function. If the observational noise is assumed to be Gaussian then this prior is conjugate to the likelihood so that the posterior distribution is also Gaussian. We study Bayesian posterior consistency in the small observational noise limit. We assume that the forward operator and the prior and noise covariance operators commute with one another. We show how, for given smoothness assumptions on the truth, the scale parameter of the prior can be adjusted to optimize the rate of posterior contraction to the truth, and we explicitly compute the logarithmic rate.

preprint2013arXiv

Complexity Analysis of Accelerated MCMC Methods for Bayesian Inversion

We study Bayesian inversion for a model elliptic PDE with unknown diffusion coefficient. We provide complexity analyses of several Markov Chain-Monte Carlo (MCMC) methods for the efficient numerical evaluation of expectations under the Bayesian posterior distribution, given data $δ$. Particular attention is given to bounds on the overall work required to achieve a prescribed error level $\varepsilon$. Specifically, we first bound the computational complexity of "plain" MCMC, based on combining MCMC sampling with linear complexity multilevel solvers for elliptic PDE. Our (new) work versus accuracy bounds show that the complexity of this approach can be quite prohibitive. Two strategies for reducing the computational complexity are then proposed and analyzed: first, a sparse, parametric and deterministic generalized polynomial chaos (gpc) "surrogate" representation of the forward response map of the PDE over the entire parameter space, and, second, a novel Multi-Level Markov Chain Monte Carlo (MLMCMC) strategy which utilizes sampling from a multilevel discretization of the posterior and of the forward PDE. For both of these strategies we derive asymptotic bounds on work versus accuracy, and hence asymptotic bounds on the computational complexity of the algorithms. In particular we provide sufficient conditions on the regularity of the unknown coefficients of the PDE, and on the approximation methods used, in order for the accelerations of MCMC resulting from these strategies to lead to complexity reductions over "plain" MCMC algorithms for Bayesian inversion of PDEs.}

preprint2013arXiv

MAP Estimators and Their Consistency in Bayesian Nonparametric Inverse Problems

We consider the inverse problem of estimating an unknown function $u$ from noisy measurements $y$ of a known, possibly nonlinear, map $\mathcal{G}$ applied to $u$. We adopt a Bayesian approach to the problem and work in a setting where the prior measure is specified as a Gaussian random field $μ_0$. We work under a natural set of conditions on the likelihood which imply the existence of a well-posed posterior measure, $μ^y$. Under these conditions we show that the {\em maximum a posteriori} (MAP) estimator is well-defined as the minimiser of an Onsager-Machlup functional defined on the Cameron-Martin space of the prior; thus we link a problem in probability with a problem in the calculus of variations. We then consider the case where the observational noise vanishes and establish a form of Bayesian posterior consistency. We also prove a similar result for the case where the observation of $\mathcal{G}(u)$ can be repeated as many times as desired with independent identically distributed noise. The theory is illustrated with examples from an inverse problem for the Navier-Stokes equation, motivated by problems arising in weather forecasting, and from the theory of conditioned diffusions, motivated by problems arising in molecular dynamics.

preprint2013arXiv

Posterior Contraction Rates for the Bayesian Approach to Linear Ill-Posed Inverse Problems

We consider a Bayesian nonparametric approach to a family of linear inverse problems in a separable Hilbert space setting with Gaussian noise. We assume Gaussian priors, which are conjugate to the model, and present a method of identifying the posterior using its precision operator. Working with the unbounded precision operator enables us to use partial differential equations (PDE) methodology to obtain rates of contraction of the posterior distribution to a Dirac measure centered on the true solution. Our methods assume a relatively weak relation between the prior covariance, noise covariance and forward operator, allowing for a wide range of applications.

preprint2013arXiv

The Ensemble Kalman Filter for Inverse Problems

The Ensemble Kalman filter (EnKF) was introduced by Evensen in 1994 [10] as a novel method for data assimilation: state estimation for noisily observed time-dependent problems. Since that time it has had enormous impact in many application domains because of its robustness and ease of implementation, and numerical evidence of its accuracy. In this paper we propose the application of an iterative ensemble Kalman method for the solution of a wide class of inverse problems. In this context we show that the estimate of the unknown function that we obtain with the ensemble Kalman method lies in a subspace A spanned by the initial ensemble. Hence the resulting error may be bounded above by the error found from the best approximation in this subspace. We provide numerical experiments which compare the error incurred by the ensemble Kalman method for inverse problems with the error of the best approximation in A, and with variants on traditional least-squares approaches, restricted to the subspace A. In so doing we demonstrate that the ensemble Kalman method for inverse problems provides a derivative-free optimization method with comparable accuracy to that achieved by traditional least-squares approaches. Furthermore, we also demonstrate that the accuracy is of the same order of magnitude as that achieved by the best approximation. Three examples are used to demonstrate these assertions: inversion of a compact linear operator; inversion of piezometric head to determine hydraulic conductivity in a Darcy model of groundwater flow; and inversion of Eulerian velocity measurements at positive times to determine the initial condition in an incompressible fluid.

preprint2012arXiv

Diffusion limits of the random walk Metropolis algorithm in high dimensions

Diffusion limits of MCMC methods in high dimensions provide a useful theoretical tool for studying computational complexity. In particular, they lead directly to precise estimates of the number of steps required to explore the target measure, in stationarity, as a function of the dimension of the state space. However, to date such results have mainly been proved for target measures with a product structure, severely limiting their applicability. The purpose of this paper is to study diffusion limits for a class of naturally occurring high-dimensional measures found from the approximation of measures on a Hilbert space which are absolutely continuous with respect to a Gaussian reference measure. The diffusion limit of a random walk Metropolis algorithm to an infinite-dimensional Hilbert space valued SDE (or SPDE) is proved, facilitating understanding of the computational complexity of the algorithm.

preprint2012arXiv

Evaluation of Gaussian approximations for data assimilation in reservoir models

In this paper we propose to numerically assess the performance of standard Gaussian approximations to probe the posterior distribution that arises from Bayesian data assimilation in petroleum reservoirs. In particular we assess the performance of (i) the linearization around the maximum a posterior estimate, (ii) the randomized maximum likelihood and (iii) standard ensemble Kalman filter-type methods. In order to fully resolve the posterior distribution we implement a state-of-the art MCMC method that scales well with respect to the dimension of the parameter space. Our implementation of the MCMC method provides the gold standard against which to assess the aforementioned Gaussian approximations. We present numerical synthetic experiments where we quantify the capability of each of the {\em ad hoc} Gaussian approximation in reproducing the mean and the variance of the posterior distribution (characterized via MCMC) associated to a data assimilation problem. The main objective of our controlled experiments is to exhibit the substantial discrepancies of the approximation properties of standard {\em ad hoc} Gaussian approximations. Numerical investigations of the type we present here will lead to greater understanding of the cost-efficient, but {\em ad hoc}, Bayesian techniques used for data assimilation in petroleum reservoirs, and hence ultimately to improved techniques with more accurate uncertainty quantification.

preprint2012arXiv

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class of target measures supported on an infinite dimensional Hilbert space. These natural measures have density with respect to a Gaussian random field measure and arise in many applications such as Bayesian nonparametric statistics and the theory of conditioned diffusions. We prove that, started in stationarity, a suitably interpolated and scaled version of the Markov chain corresponding to MALA converges to an infinite dimensional diffusion process. Our results imply that, in stationarity, the MALA algorithm applied to an N-dimensional approximation of the target will take $\mathcal{O}(N^{1/3})$ steps to explore the invariant measure, comparing favorably with the Random Walk Metropolis which was recently shown to require $\mathcal{O}(N)$ steps when applied to the same class of problems.

preprint2011arXiv

Sampling conditioned hypoelliptic diffusions

A series of recent articles introduced a method to construct stochastic partial differential equations (SPDEs) which are invariant with respect to the distribution of a given conditioned diffusion. These works are restricted to the case of elliptic diffusions where the drift has a gradient structure and the resulting SPDE is of second-order parabolic type. The present article extends this methodology to allow the construction of SPDEs which are invariant with respect to the distribution of a class of hypoelliptic diffusion processes, subject to a bridge conditioning, leading to SPDEs which are of fourth-order parabolic type. This allows the treatment of more realistic physical models, for example, one can use the resulting SPDE to study transitions between meta-stable states in mechanical systems with friction and noise. In this situation the restriction of the drift being a gradient can also be lifted.

preprint2011arXiv

Uncertainty quantification and weak approximation of an elliptic inverse problem

We consider the inverse problem of determining the permeability from the pressure in a Darcy model of flow in a porous medium. Mathematically the problem is to find the diffusion coefficient for a linear uniformly elliptic partial differential equation in divergence form, in a bounded domain in dimension $d \le 3$, from measurements of the solution in the interior. We adopt a Bayesian approach to the problem. We place a prior random field measure on the log permeability, specified through the Karhunen-Loève expansion of its draws. We consider Gaussian measures constructed this way, and study the regularity of functions drawn from them. We also study the Lipschitz properties of the observation operator mapping the log permeability to the observations. Combining these regularity and continuity estimates, we show that the posterior measure is well-defined on a suitable Banach space. Furthermore the posterior measure is shown to be Lipschitz with respect to the data in the Hellinger metric, giving rise to a form of well-posedness of the inverse problem. Determining the posterior measure, given the data, solves the problem of uncertainty quantification for this inverse problem. In practice the posterior measure must be approximated in a finite dimensional space. We quantify the errors incurred by employing a truncated Karhunen-Loève expansion to represent this meausure. In particular we study weak convergence of a general class of locally Lipschitz functions of the log permeability, and apply this general theory to estimate errors in the posterior mean of the pressure and the pressure covariance, under refinement of the finite dimensional Karhunen-Loève truncation.

preprint2010arXiv

Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations

Numerical approximation of the long time behavior of a stochastic differential equation (SDE) is considered. Error estimates for time-averaging estimators are obtained and then used to show that the stationary behavior of the numerical method converges to that of the SDE. The error analysis is based on using an associated Poisson equation for the underlying SDE. The main advantage of this approach is its simplicity and universality. It works equally well for a range of explicit and implicit schemes including those with simple simulation of random variables, and for hypoelliptic SDEs. To simplify the exposition, we consider only the case where the state space of the SDE is a torus and we study only smooth test functions. However we anticipate that the approach can be applied more widely. An analogy between our approach and Stein's method is indicated. Some practical implications of the results are discussed.

preprint2010arXiv

Optimal tuning of the Hybrid Monte-Carlo Algorithm

We investigate the properties of the Hybrid Monte-Carlo algorithm (HMC) in high dimensions. HMC develops a Markov chain reversible w.r.t. a given target distribution $Π$ by using separable Hamiltonian dynamics with potential $-\logΠ$. The additional momentum variables are chosen at random from the Boltzmann distribution and the continuous-time Hamiltonian dynamics are then discretised using the leapfrog scheme. The induced bias is removed via a Metropolis-Hastings accept/reject rule. In the simplified scenario of independent, identically distributed components, we prove that, to obtain an $\mathcal{O}(1)$ acceptance probability as the dimension $d$ of the state space tends to $\infty$, the leapfrog step-size $h$ should be scaled as $h= l \times d^{-1/4}$. Therefore, in high dimensions, HMC requires $\mathcal{O}(d^{1/4})$ steps to traverse the state space. We also identify analytically the asymptotically optimal acceptance probability, which turns out to be 0.651 (to three decimal places). This is the choice which optimally balances the cost of generating a proposal, which {\em decreases} as $l$ increases, against the cost related to the average number of proposals required to obtain acceptance, which {\em increases} as $l$ increases.

Andrew M. Stuart

What is connected

Connect this record

See the researcher in context

Building this map preview

40 published item(s)

Amortized Energy-Based Bayesian Inference

Learning Homogenization for Elliptic Operators

A Framework for Machine Learning of Model Error in Dynamical Systems

Efficient Derivative-free Bayesian Inference for Large-Scale Inverse Problems

Ensemble Inference Methods for Models With Noisy and Expensive Likelihoods

Iterated Kalman Methodology For Inverse Problems

Learning Markovian Homogenized Models in Viscoelasticity

The Cost-Accuracy Trade-Off In Operator Learning With Neural Networks

The Random Feature Model for Input-Output Maps between Banach Spaces

Consistency of semi-supervised learning algorithms on graphs: Probit and one-hot methods

Reconciling Bayesian and perimeter regularization for binary inversion

Spectral Analysis Of Weighted Laplacians Arising In Data Clustering

Analysis of the ensemble Kalman filter for inverse problems

Gaussian Approximations of Small Noise Diffusions in Kullback-Leibler Divergence

Hierarchical Bayesian Level Set Inversion

MAP Estimators for Piecewise Continuous Inversion

The Bayesian Formulation of EIT: Analysis and Algorithms

Weak error estimates for trajectories of SPDEs for Spectral Galerkin discretization

A Bayesian Level Set Method for Geometric Inverse Problems

Filter Based Methods For Statistical Linear Inverse Problems

The Bayesian Approach To Inverse Problems

A Function Space HMC Algorithm With Second Order Langevin Diffusion Limit

Algorithms for Kullback-Leibler Approximation of Probability Measures in Infinite Dimensions

Analysis of the Gibbs sampler for hierarchical inverse problems

Determining White Noise Forcing From Eulerian Observations in the Navier Stokes Equation

Gradient Flow from a Random Walk in Hilbert Space

Spectral gaps for a Metropolis-Hastings algorithm in infinite dimensions

Well-Posed Bayesian Geometric Inverse Problems Arising in Subsurface Flow

Bayesian Posterior Contraction Rates for Linear Severely Ill-posed Inverse Problems

Complexity Analysis of Accelerated MCMC Methods for Bayesian Inversion

MAP Estimators and Their Consistency in Bayesian Nonparametric Inverse Problems

Posterior Contraction Rates for the Bayesian Approach to Linear Ill-Posed Inverse Problems

The Ensemble Kalman Filter for Inverse Problems

Diffusion limits of the random walk Metropolis algorithm in high dimensions

Evaluation of Gaussian approximations for data assimilation in reservoir models

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

Sampling conditioned hypoelliptic diffusions

Uncertainty quantification and weak approximation of an elliptic inverse problem

Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations

Optimal tuning of the Hybrid Monte-Carlo Algorithm