Source author record

Omar Ghattas

Omar Ghattas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA math.OC Numerical Analysis Computation Methodology Machine Learning Computational Engineering, Finance, and Science Mathematical Software Distributed, Parallel, and Cluster Computing math.ST Statistics Theory Applications astro-ph.CO cond-mat.mtrl-sci Neural and Evolutionary Computing physics.comp-ph physics.geo-ph

Catalog footprint

What is connected

25works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A stochastic Stein Variational Newton method

Stein variational gradient descent (SVGD) is a general-purpose optimization-based sampling algorithm that has recently exploded in popularity, but is limited by two issues: it is known to produce biased samples, and it can be slow to converge on complicated distributions. A recently proposed stochastic variant of SVGD (sSVGD) addresses the first issue, producing unbiased samples by incorporating a special noise into the SVGD dynamics such that asymptotic convergence is guaranteed. Meanwhile, Stein variational Newton (SVN), a Newton-like extension of SVGD, dramatically accelerates the convergence of SVGD by incorporating Hessian information into the dynamics, but also produces biased samples. In this paper we derive, and provide a practical implementation of, a stochastic variant of SVN (sSVN) which is both asymptotically correct and converges rapidly. We demonstrate the effectiveness of our algorithm on a difficult class of test problems -- the Hybrid Rosenbrock density -- and show that sSVN converges using three orders of magnitude fewer gradient evaluations of the log likelihood than its stochastic SVGD counterpart. Our results show that sSVN is a promising approach to accelerating high-precision Bayesian inference tasks with modest-dimension, $d\sim\mathcal{O}(10)$.

preprint2022arXiv

An efficient method for goal-oriented linear Bayesian optimal experimental design: Application to optimal sensor placemen

Optimal experimental design (OED) plays an important role in the problem of identifying uncertainty with limited experimental data. In many applications, we seek to minimize the uncertainty of a predicted quantity of interest (QoI) based on the solution of the inverse problem, rather than the inversion model parameter itself. In these scenarios, we develop an efficient method for goal-oriented optimal experimental design (GOOED) for large-scale Bayesian linear inverse problem that finds sensor locations to maximize the expected information gain (EIG) for a predicted QoI. By deriving a new formula to compute the EIG, exploiting low-rank structures of two appropriate operators, we are able to employ an online-offline decomposition scheme and a swapping greedy algorithm to maximize the EIG at a cost measured in model solutions that is independent of the problem dimensions. We provide detailed error analysis of the approximated EIG, and demonstrate the efficiency, accuracy, and both data- and parameter-dimension independence of the proposed algorithm for a contaminant transport inverse problem with infinite-dimensional parameter field.

preprint2022arXiv

Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport

We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for high-dimensional image data that is generally intractable to evaluate. We tackle this challenging Bayesian inference problem using a likelihood-free approach based on measure transport together with the construction of summary statistics for the image data. We also show that expected information gains (EIGs) from the observed data about the model parameters can be computed with no significant additional cost. Lastly, we present a numerical case study based on the Ohta--Kawasaki model for diblock copolymer thin film self-assembly and top-down microscopy characterization. For calibration, we introduce several domain-specific energy- and Fourier-based summary statistics, and quantify their informativeness using EIG. We demonstrate the power of the proposed approach to study the effect of data corruptions and experimental designs on the calibration results.

preprint2022arXiv

hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty

Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods on several inverse problems. These include problems with linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large ($\sim 50\times$) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.

preprint2022arXiv

Large-scale Bayesian optimal experimental design with derivative-informed projected neural network

We address the solution of large-scale Bayesian optimal experimental design (OED) problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields. The OED problem seeks to find sensor locations that maximize the expected information gain (EIG) in the solution of the underlying Bayesian inverse problem. Computation of the EIG is usually prohibitive for PDE-based OED problems. To make the evaluation of the EIG tractable, we approximate the (PDE-based) parameter-to-observable map with a derivative-informed projected neural network (DIPNet) surrogate, which exploits the geometry, smoothness, and intrinsic low-dimensionality of the map using a small and dimension-independent number of PDE solves. The surrogate is then deployed within a greedy algorithm-based solution of the OED problem such that no further PDE solves are required. We analyze the EIG approximation error in terms of the generalization error of the DIPNet and show they are of the same order. Finally, the efficiency and accuracy of the method are demonstrated via numerical experiments on OED problems governed by inverse scattering and inverse reactive transport with up to 16,641 uncertain parameters and 100 experimental design variables, where we observe up to three orders of magnitude speedup relative to a reference double loop Monte Carlo method.

preprint2022arXiv

Nonuniform 3D finite difference elastic wave simulation on staggered grids

We present an approach to simulate the 3D isotropic elastic wave propagation using nonuniform finite difference discretization on staggered grids. Specifically, we consider simulation domains composed of layers of uniform grids with different grid spacings, separated by nonconforming interfaces. We demonstrate that this layer-wise finite difference discretization has the potential to significantly reduce the simulation cost, compared to its fully uniform counterpart. Stability of such a discretization is achieved by using specially designed difference operators, which are variants of the standard difference operators with adaptations near boundaries or interfaces, and penalty terms, which are appended to the discretized wave system to weakly impose boundary or interface conditions. Combined with specially designed interpolation operators, the discretized wave system is shown to preserve the energy conserving property of the continuous elastic wave equation, and $\textit{a fortiori}$ ensure the stability of the simulation. Numerical examples are presented to demonstrate the efficacy of the proposed simulation approach.

preprint2020arXiv

Hierarchical Matrix Approximations of Hessians Arising in Inverse Problems Governed by PDEs

Hessian operators arising in inverse problems governed by partial differential equations (PDEs) play a critical role in delivering efficient, dimension-independent convergence for both Newton solution of deterministic inverse problems, as well as Markov chain Monte Carlo sampling of posteriors in the Bayesian setting. These methods require the ability to repeatedly perform such operations on the Hessian as multiplication with arbitrary vectors, solving linear systems, inversion, and (inverse) square root. Unfortunately, the Hessian is a (formally) dense, implicitly-defined operator that is intractable to form explicitly for practical inverse problems, requiring as many PDE solves as inversion parameters. Low rank approximations are effective when the data contain limited information about the parameters, but become prohibitive as the data become more informative. However, the Hessians for many inverse problems arising in practical applications can be well approximated by matrices that have hierarchically low rank structure. Hierarchical matrix representations promise to overcome the high complexity of dense representations and provide effective data structures and matrix operations that have only log-linear complexity. In this work, we describe algorithms for constructing and updating hierarchical matrix approximations of Hessians, and illustrate them on a number of representative inverse problems involving time-dependent diffusion, advection-dominated transport, frequency domain acoustic wave propagation, and low frequency Maxwell equations, demonstrating up to an order of magnitude speedup compared to globally low rank approximations.

preprint2020arXiv

hIPPYlib: An Extensible Software Framework for Large-Scale Inverse Problems Governed by PDEs; Part I: Deterministic Inversion and Linearized Bayesian Inference

We present an extensible software framework, hIPPYlib, for solution of large-scale deterministic and Bayesian inverse problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields (which are high-dimensional after discretization). hIPPYlib overcomes the prohibitive nature of Bayesian inversion for this class of problems by implementing state-of-the-art scalable algorithms for PDE-based inverse problems that exploit the structure of the underlying operators, notably the Hessian of the log-posterior. The key property of the algorithms implemented in hIPPYlib is that the solution of the deterministic and linearized Bayesian inverse problem is computed at a cost, measured in linearized forward PDE solves, that is independent of the parameter dimension. The mean of the posterior is approximated by the MAP point, which is found by minimizing the negative log-posterior. This deterministic nonlinear least-squares optimization problem is solved with an inexact matrix-free Newton-CG method. The posterior covariance is approximated by the inverse of the Hessian of the negative log posterior evaluated at the MAP point. This Gaussian approximation is exact when the parameter-to-observable map is linear; otherwise, its logarithm agrees to two derivatives with the log-posterior at the MAP point, and thus it can serve as a proposal for Hessian-based MCMC methods. The construction of the posterior covariance is made tractable by invoking a low-rank approximation of the Hessian of the log-likelihood. Scalable tools for sample generation are also implemented. hIPPYlib makes all of these advanced algorithms easily accessible to domain scientists and provides an environment that expedites the development of new algorithms. hIPPYlib is also a teaching tool to educate researchers and practitioners who are new to inverse problems and the Bayesian inference framework.

preprint2020arXiv

Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training

In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.

preprint2020arXiv

Projected Stein Variational Gradient Descent

The curse of dimensionality is a longstanding challenge in Bayesian inference in high dimensions. In this work, we propose a projected Stein variational gradient descent (pSVGD) method to overcome this challenge by exploiting the fundamental property of intrinsic low dimensionality of the data informed subspace stemming from ill-posedness of such problems. We adaptively construct the subspace using a gradient information matrix of the log-likelihood, and apply pSVGD to the much lower-dimensional coefficients of the parameter projection. The method is demonstrated to be more accurate and efficient than SVGD. It is also shown to be more scalable with respect to the number of parameters, samples, data points, and processor cores via experiments with parameters dimensions ranging from the hundreds to the tens of thousands.

preprint2020arXiv

Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions

We propose a fast and scalable variational method for Bayesian inference in high-dimensional parameter space, which we call projected Stein variational Newton (pSVN) method. We exploit the intrinsic low-dimensional geometric structure of the posterior distribution in the high-dimensional parameter space via its Hessian (of the log posterior) operator and perform a parallel update of the parameter samples projected into a low-dimensional subspace by an SVN method. The subspace is adaptively constructed using the eigenvectors of the averaged Hessian at the current samples. We demonstrate fast convergence of the proposed method and its scalability with respect to the number of parameters, samples, and processor cores.

preprint2020arXiv

Stein variational reduced basis Bayesian inversion

We propose and analyze a Stein variational reduced basis method (SVRB) to solve large-scale PDE-constrained Bayesian inverse problems. To address the computational challenge of drawing numerous samples requiring expensive PDE solves from the posterior distribution, we integrate an adaptive and goal-oriented model reduction technique with an optimization-based Stein variational gradient descent method (SVGD). The samples are drawn from the prior distribution and iteratively pushed to the posterior by a sequence of transport maps, which are constructed by SVGD, requiring the evaluation of the potential---the negative log of the likelihood function---and its gradient with respect to the random parameters, which depend on the solution of the PDE. To reduce the computational cost, we develop an adaptive and goal-oriented model reduction technique based on reduced basis approximations for the evaluation of the potential and its gradient. We present a detailed analysis for the reduced basis approximation errors of the potential and its gradient, the induced errors of the posterior distribution measured by Kullback--Leibler divergence, as well as the errors of the samples. To demonstrate the computational accuracy and efficiency of SVRB, we report results of numerical experiments on a Bayesian inverse problem governed by a diffusion PDE with random parameters with both uniform and Gaussian prior distributions. Over 100X speedups can be achieved while the accuracy of the approximation of the potential and its gradient is preserved.

preprint2020arXiv

Tensor train construction from tensor actions, with application to compression of large high order derivative tensors

We present a method for converting tensors into tensor train format based on actions of the tensor as a vector-valued multilinear function. Existing methods for constructing tensor trains require access to "array entries" of the tensor and are therefore inefficient or computationally prohibitive if the tensor is accessible only through its action, especially for high order tensors. Our method permits efficient tensor train compression of large high order derivative tensors for nonlinear mappings that are implicitly defined through the solution of a system of equations. Array entries of these derivative tensors are not directly accessible, but actions of these tensors can be computed efficiently via a procedure that we discuss. Such tensors are often amenable to tensor train compression in theory, but until now no efficient algorithm existed to convert them into tensor train format. We demonstrate our method by compressing a Hilbert tensor of size $41 \times 42 \times 43 \times 44 \times 45$, and by forming high order (up to $5^\text{th}$ order derivatives/$6^\text{th}$ order tensors) Taylor series surrogates of the noise-whitened parameter-to-output map for a stochastic partial differential equation with boundary output.

preprint2017arXiv

Weighted BFBT Preconditioner for Stokes Flow Problems with Highly Heterogeneous Viscosity

We present a weighted BFBT approximation (w-BFBT) to the inverse Schur complement of a Stokes system with highly heterogeneous viscosity. When used as part of a Schur complement-based Stokes preconditioner, we observe robust fast convergence for Stokes problems with smooth but highly varying (up to 10 orders of magnitude) viscosities, optimal algorithmic scalability with respect to mesh refinement, and only a mild dependence on the polynomial order of high-order finite element discretizations ($Q_k \times P_{k-1}^{disc}$, order $k \ge 2$). For certain difficult problems, we demonstrate numerically that w-BFBT significantly improves Stokes solver convergence over the widely used inverse viscosity-weighted pressure mass matrix approximation of the Schur complement. In addition, we derive theoretical eigenvalue bounds to prove spectral equivalence of w-BFBT. Using detailed numerical experiments, we discuss modifications to w-BFBT at Dirichlet boundaries that decrease the number of iterations. The overall algorithmic performance of the Stokes solver is governed by the efficacy of w-BFBT as a Schur complement approximation and, in addition, by our parallel hybrid spectral-geometric-algebraic multigrid (HMG) method, which we use to approximate the inverses of the viscous block and variable-coefficient pressure Poisson operators within w-BFBT. Building on the scalability of HMG, our Stokes solver achieves a parallel efficiency of 90% while weak scaling over a more than 600-fold increase from 48 to all 30,000 cores of TACC's Lonestar 5 supercomputer.

preprint2016arXiv

A randomized maximum a posterior method for posterior sampling of high dimensional nonlinear Bayesian inverse problems

We present a randomized maximum a posteriori (rMAP) method for generating approximate samples of posteriors in high dimensional Bayesian inverse problems governed by large-scale forward problems. We derive the rMAP approach by: 1) casting the problem of computing the MAP point as a stochastic optimization problem; 2) interchanging optimization and expectation; and 3) approximating the expectation with a Monte Carlo method. For a specific randomized data and prior mean, rMAP reduces to the maximum likelihood approach (RML). It can also be viewed as an iterative stochastic Newton method. An analysis of the convergence of the rMAP samples is carried out for both linear and nonlinear inverse problems. Each rMAP sample requires solution of a PDE-constrained optimization problem; to solve these problems, we employ a state-of-the-art trust region inexact Newton conjugate gradient method with sensitivity-based warm starts. An approximate Metropolization approach is presented to reduce the bias in rMAP samples. Various numerical methods will be presented to demonstrate the potential of the rMAP approach in posterior sampling of nonlinear Bayesian inverse problems in high dimensions.

preprint2015arXiv

A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-dimensional Bayesian Nonlinear Inverse Problems

We address the problem of optimal experimental design (OED) for Bayesian nonlinear inverse problems governed by PDEs. The goal is to find a placement of sensors, at which experimental data are collected, so as to minimize the uncertainty in the inferred parameter field. We formulate the OED objective function by generalizing the classical A-optimal experimental design criterion using the expected value of the trace of the posterior covariance. We seek a method that solves the OED problem at a cost (measured in the number of forward PDE solves) that is independent of both the parameter and sensor dimensions. To facilitate this, we construct a Gaussian approximation to the posterior at the maximum a posteriori probability (MAP) point, and use the resulting covariance operator to define the OED objective function. We use randomized trace estimation to compute the trace of this (implicitly defined) covariance operator. The resulting OED problem includes as constraints the PDEs characterizing the MAP point, and the PDEs describing the action of the covariance operator to vectors. The sparsity of the sensor configurations is controlled using sparsifying penalty functions. We elaborate our OED method for the problem of determining the sensor placement to best infer the coefficient of an elliptic PDE. Adjoint methods are used to compute the gradient of the PDE-constrained OED objective function. We provide numerical results for inference of the permeability field in a porous medium flow problem, and demonstrate that the number of PDE solves required for the evaluation of the OED objective function and its gradient is essentially independent of both the parameter and sensor dimensions. The number of quasi-Newton iterations for computing an OED also exhibits the same dimension invariance properties.

preprint2015arXiv

Recursive Algorithms for Distributed Forests of Octrees

The forest-of-octrees approach to parallel adaptive mesh refinement and coarsening (AMR) has recently been demonstrated in the context of a number of large-scale PDE-based applications. Although linear octrees, which store only leaf octants, have an underlying tree structure by definition, it is not often exploited in previously published mesh-related algorithms. This is because the branches are not explicitly stored, and because the topological relationships in meshes, such as the adjacency between cells, introduce dependencies that do not respect the octree hierarchy. In this work we combine hierarchical and topological relationships between octree branches to design efficient recursive algorithms. We present three important algorithms with recursive implementations. The first is a parallel search for leaves matching any of a set of multiple search criteria. The second is a ghost layer construction algorithm that handles arbitrarily refined octrees that are not covered by previous algorithms, which require a 2:1 condition between neighboring leaves. The third is a universal mesh topology iterator. This iterator visits every cell in a domain partition, as well as every interface (face, edge and corner) between these cells. The iterator calculates the local topological information for every interface that it visits, taking into account the nonconforming interfaces that increase the complexity of describing the local topology. To demonstrate the utility of the topology iterator, we use it to compute the numbering and encoding of higher-order $C^0$ nodal basis functions. We analyze the complexity of the new recursive algorithms theoretically, and assess their performance, both in terms of single-processor efficiency and in terms of parallel scalability, demonstrating good weak and strong scaling up to 458k cores of the JUQUEEN supercomputer.

preprint2015arXiv

Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet

The majority of research on efficient and scalable algorithms in computational science and engineering has focused on the forward problem: given parameter inputs, solve the governing equations to determine output quantities of interest. In contrast, here we consider the broader question: given a (large-scale) model containing uncertain parameters, (possibly) noisy observational data, and a prediction quantity of interest, how do we construct efficient and scalable algorithms to (1) infer the model parameters from the data (the deterministic inverse problem), (2) quantify the uncertainty in the inferred parameters (the Bayesian inference problem), and (3) propagate the resulting uncertain parameters through the model to issue predictions with quantified uncertainties (the forward uncertainty propagation problem)? We present efficient and scalable algorithms for this end-to-end, data-to-prediction process under the Gaussian approximation and in the context of modeling the flow of the Antarctic ice sheet and its effect on sea level. The ice is modeled as a viscous, incompressible, creeping, shear-thinning fluid. The observational data come from InSAR satellite measurements of surface ice flow velocity, and the uncertain parameter field to be inferred is the basal sliding parameter. The prediction quantity of interest is the present-day ice mass flux from the Antarctic continent to the ocean. We show that the work required for executing this data-to-prediction process is independent of the state dimension, parameter dimension, data dimension, and number of processor cores. The key to achieving this dimension independence is to exploit the fact that the observational data typically provide only sparse information on model parameters. This property can be exploited to construct a low rank approximation of the linearized parameter-to-observable map.

preprint2015arXiv

Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics

Motivated by the need for efficient and accurate simulation of the dynamics of the polar ice sheets, we design high-order finite element discretizations and scalable solvers for the solution of nonlinear incompressible Stokes equations. We focus on power-law, shear thinning rheologies used in modeling ice dynamics and other geophysical flows. We use nonconforming hexahedral meshes and the conforming inf-sup stable finite element velocity-pressure pairings $\mathbb{Q}_k\times \mathbb{Q}^\text{disc}_{k-2}$ or $\mathbb{Q}_k \times \mathbb{P}^\text{disc}_{k-1}$. To solve the nonlinear equations, we propose a Newton-Krylov method with a block upper triangular preconditioner for the linearized Stokes systems. The diagonal blocks of this preconditioner are sparse approximations of the (1,1)-block and of its Schur complement. The (1,1)-block is approximated using linear finite elements based on the nodes of the high-order discretization, and the application of its inverse is approximated using algebraic multigrid with an incomplete factorization smoother. This preconditioner is designed to be efficient on anisotropic meshes, which are necessary to match the high aspect ratio domains typical for ice sheets. We develop and make available extensions to two libraries---a hybrid meshing scheme for the p4est parallel AMR library, and a modified smoothed aggregation scheme for PETSc---to improve their support for solving PDEs in high aspect ratio domains. In a numerical study, we find that our solver yields fast convergence that is independent of the element aspect ratio, the occurrence of nonconforming interfaces, and of mesh refinement, and that depends only weakly on the polynomial finite element order. We simulate the ice flow in a realistic description of the Antarctic ice sheet derived from field data, and study the parallel scalability of our solver for problems with up to 383M unknowns.

preprint2014arXiv

A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems

We address the numerical solution of infinite-dimensional inverse problems in the framework of Bayesian inference. In the Part I companion to this paper (arXiv.org:1308.1313), we considered the linearized infinite-dimensional inverse problem. Here in Part II, we relax the linearization assumption and consider the fully nonlinear infinite-dimensional inverse problem using a Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of sampling high-dimensional pdfs arising from Bayesian inverse problems governed by PDEs, we build on the stochastic Newton MCMC method. This method exploits problem structure by taking as a proposal density a local Gaussian approximation of the posterior pdf, whose construction is made tractable by invoking a low-rank approximation of its data misfit component of the Hessian. Here we introduce an approximation of the stochastic Newton proposal in which we compute the low-rank-based Hessian at just the MAP point, and then reuse this Hessian at each MCMC step. We compare the performance of the proposed method to the original stochastic Newton MCMC method and to an independence sampler. The comparison of the three methods is conducted on a synthetic ice sheet inverse problem. For this problem, the stochastic Newton MCMC method with a MAP-based Hessian converges at least as rapidly as the original stochastic Newton MCMC method, but is far cheaper since it avoids recomputing the Hessian at each step. On the other hand, it is more expensive per sample than the independence sampler; however, its convergence is significantly more rapid, and thus overall it is much cheaper. Finally, we present extensive analysis and interpretation of the posterior distribution, and classify directions in parameter space based on the extent to which they are informed by the prior or the observations.

preprint2014arXiv

A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized $\ell_0$-sparsification

We present an efficient method for computing A-optimal experimental designs for infinite-dimensional Bayesian linear inverse problems governed by partial differential equations (PDEs). Specifically, we address the problem of optimizing the location of sensors (at which observational data are collected) to minimize the uncertainty in the parameters estimated by solving the inverse problem, where the uncertainty is expressed by the trace of the posterior covariance. Computing optimal experimental designs (OEDs) is particularly challenging for inverse problems governed by computationally expensive PDE models with infinite-dimensional (or, after discretization, high-dimensional) parameters. To alleviate the computational cost, we exploit the problem structure and build a low-rank approximation of the parameter-to-observable map, preconditioned with the square root of the prior covariance operator. This relieves our method from expensive PDE solves when evaluating the optimal experimental design objective function and its derivatives. Moreover, we employ a randomized trace estimator for efficient evaluation of the OED objective function. We control the sparsity of the sensor configuration by employing a sequence of penalty functions that successively approximate the $\ell_0$-"norm"; this results in binary designs that characterize optimal sensor locations. We present numerical results for inference of the initial condition from spatio-temporal observations in a time-dependent advection-diffusion problem in two and three space dimensions. We find that an optimal design can be computed at a cost, measured in number of forward PDE solves, that is independent of the parameter and sensor dimensions. We demonstrate numerically that $\ell_0$-sparsified experimental designs obtained via a continuation method outperform $\ell_1$-sparsified designs.

preprint2014arXiv

On Bayesian A- and D-optimal experimental designs in infinite dimensions

We consider Bayesian linear inverse problems in infinite-dimensional separable Hilbert spaces, with a Gaussian prior measure and additive Gaussian noise model, and provide an extension of the concept of Bayesian D-optimality to the infinite-dimensional case. To this end, we derive the infinite-dimensional version of the expression for the Kullback-Leibler divergence from the posterior measure to the prior measure, which is subsequently used to derive the expression for the expected information gain. We also study the notion of Bayesian A-optimality in the infinite-dimensional setting, and extend the well known (in the finite-dimensional case) equivalence of the Bayes risk of the MAP estimator with the trace of the posterior covariance, for the Gaussian linear case, to the infinite-dimensional Hilbert space case.

preprint2013arXiv

A computational framework for infinite-dimensional Bayesian inverse problems. Part I: The linearized case, with application to global seismic inversion

We present a computational framework for estimating the uncertainty in the numerical solution of linearized infinite-dimensional statistical inverse problems. We adopt the Bayesian inference formulation: given observational data and their uncertainty, the governing forward problem and its uncertainty, and a prior probability distribution describing uncertainty in the parameter field, find the posterior probability distribution over the parameter field. The prior must be chosen appropriately in order to guarantee well-posedness of the infinite-dimensional inverse problem and facilitate computation of the posterior. Furthermore, straightforward discretizations may not lead to convergent approximations of the infinite-dimensional problem. And finally, solution of the discretized inverse problem via explicit construction of the covariance matrix is prohibitive due to the need to solve the forward problem as many times as there are parameters. Our computational framework builds on the infinite-dimensional formulation proposed by Stuart (A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numerica, 19 (2010), pp. 451-559), and incorporates a number of components aimed at ensuring a convergent discretization of the underlying infinite-dimensional inverse problem. The framework additionally incorporates algorithms for manipulating the prior, constructing a low rank approximation of the data-informed component of the posterior covariance operator, and exploring the posterior that together ensure scalability of the entire framework to very high parameter dimensions. We demonstrate this computational framework on the Bayesian solution of an inverse problem in 3D global seismic wave propagation with hundreds of thousands of parameters.

preprint2013arXiv

A Nested Partitioning Scheme for Parallel Heterogeneous Clusters

Modern supercomputers are increasingly requiring the presence of accelerators and co-processors. However, it has not been easy to achieve good performance on such heterogeneous clusters. The key challenge has been to ensure good load balance and that neither the CPU nor the accelerator is left idle. Traditional approaches have offloaded entire computations to the accelerator, resulting in an idle CPU, or have opted for task-level parallelism requiring large data transfers between the CPU and the accelerator. True work-parallelism has been hard as the Accelerators cannot directly communicate with other CPUs (besides the host) and Accelerators. In this work, we present a new nested partition scheme to overcome this problem. By partitioning the work assignment on a given node asymmetrically into boundary and interior work, and assigning the interior to the accelerator, we are able to achieve excellent efficiency while ensure proper utilization of both the CPU and Accelerator resources. The problem used for evaluating the new partition is an $hp$ discontinuous Galerkin spectral element method for a coupled elastic--acoustic wave propagation problem.

preprint2013arXiv

Discretely exact derivatives for hyperbolic PDE-constrained optimization problems discretized by the discontinuous Galerkin method

This paper discusses the computation of derivatives for optimization problems governed by linear hyperbolic systems of partial differential equations (PDEs) that are discretized by the discontinuous Galerkin (dG) method. An efficient and accurate computation of these derivatives is important, for instance, in inverse problems and optimal control problems. This computation is usually based on an adjoint PDE system, and the question addressed in this paper is how the discretization of this adjoint system should relate to the dG discretization of the hyperbolic state equation. Adjoint-based derivatives can either be computed before or after discretization; these two options are often referred to as the optimize-then-discretize and discretize-then-optimize approaches. We discuss the relation between these two options for dG discretizations in space and Runge-Kutta time integration. Discretely exact discretizations for several hyperbolic optimization problems are derived, including the advection equation, Maxwell's equations and the coupled elastic-acoustic wave equation. We find that the discrete adjoint equation inherits a natural dG discretization from the discretization of the state equation and that the expressions for the discretely exact gradient often have to take into account contributions from element faces. For the coupled elastic-acoustic wave equation, the correctness and accuracy of our derivative expressions are illustrated by comparisons with finite difference gradients. The results show that a straightforward discretization of the continuous gradient differs from the discretely exact gradient, and thus is not consistent with the discretized objective. This inconsistency may cause difficulties in the convergence of gradient based algorithms for solving optimization problems.

Omar Ghattas

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

A stochastic Stein Variational Newton method

An efficient method for goal-oriented linear Bayesian optimal experimental design: Application to optimal sensor placemen

Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport

hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty

Large-scale Bayesian optimal experimental design with derivative-informed projected neural network

Nonuniform 3D finite difference elastic wave simulation on staggered grids

Hierarchical Matrix Approximations of Hessians Arising in Inverse Problems Governed by PDEs

hIPPYlib: An Extensible Software Framework for Large-Scale Inverse Problems Governed by PDEs; Part I: Deterministic Inversion and Linearized Bayesian Inference

Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training

Projected Stein Variational Gradient Descent

Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions

Stein variational reduced basis Bayesian inversion

Tensor train construction from tensor actions, with application to compression of large high order derivative tensors

Weighted BFBT Preconditioner for Stokes Flow Problems with Highly Heterogeneous Viscosity

A randomized maximum a posterior method for posterior sampling of high dimensional nonlinear Bayesian inverse problems

A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-dimensional Bayesian Nonlinear Inverse Problems

Recursive Algorithms for Distributed Forests of Octrees

Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet

Solution of nonlinear Stokes equations discretized by high-order finite elements on nonconforming and anisotropic meshes, with application to ice sheet dynamics

A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems

A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized $\ell_0$-sparsification

On Bayesian A- and D-optimal experimental designs in infinite dimensions

A computational framework for infinite-dimensional Bayesian inverse problems. Part I: The linearized case, with application to global seismic inversion

A Nested Partitioning Scheme for Parallel Heterogeneous Clusters

Discretely exact derivatives for hyperbolic PDE-constrained optimization problems discretized by the discontinuous Galerkin method