Researcher profile

Thomas O'Leary-Roseberry

Thomas O'Leary-Roseberry contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Large-scale Bayesian optimal experimental design with derivative-informed projected neural network

We address the solution of large-scale Bayesian optimal experimental design (OED) problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields. The OED problem seeks to find sensor locations that maximize the expected information gain (EIG) in the solution of the underlying Bayesian inverse problem. Computation of the EIG is usually prohibitive for PDE-based OED problems. To make the evaluation of the EIG tractable, we approximate the (PDE-based) parameter-to-observable map with a derivative-informed projected neural network (DIPNet) surrogate, which exploits the geometry, smoothness, and intrinsic low-dimensionality of the map using a small and dimension-independent number of PDE solves. The surrogate is then deployed within a greedy algorithm-based solution of the OED problem such that no further PDE solves are required. We analyze the EIG approximation error in terms of the generalization error of the DIPNet and show they are of the same order. Finally, the efficiency and accuracy of the method are demonstrated via numerical experiments on OED problems governed by inverse scattering and inverse reactive transport with up to 16,641 uncertain parameters and 100 experimental design variables, where we observe up to three orders of magnitude speedup relative to a reference double loop Monte Carlo method.

preprint2020arXiv

Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training

In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.

preprint2020arXiv

Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions

We propose a fast and scalable variational method for Bayesian inference in high-dimensional parameter space, which we call projected Stein variational Newton (pSVN) method. We exploit the intrinsic low-dimensional geometric structure of the posterior distribution in the high-dimensional parameter space via its Hessian (of the log posterior) operator and perform a parallel update of the parameter samples projected into a low-dimensional subspace by an SVN method. The subspace is adaptively constructed using the eigenvectors of the averaged Hessian at the current samples. We demonstrate fast convergence of the proposed method and its scalability with respect to the number of parameters, samples, and processor cores.