Source author record

Alec M. Dunton

Alec M. Dunton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Numerical Analysis astro-ph.IM physics.comp-ph

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fast Gaussian Process Posterior Mean Prediction via Local Cross Validation and Precomputation

Gaussian processes (GPs) are Bayesian non-parametric models useful in a myriad of applications. Despite their popularity, the cost of GP predictions (quadratic storage and cubic complexity with respect to the number of training points) remains a hurdle in applying GPs to large data. We present a fast posterior mean prediction algorithm called FastMuyGPs to address this shortcoming. FastMuyGPs is based upon the MuyGPs hyperparameter estimation algorithm and utilizes a combination of leave-one-out cross-validation, batching, nearest neighbors sparsification, and precomputation to provide scalable, fast GP prediction. We demonstrate several benchmarks wherein FastMuyGPs prediction attains superior accuracy and competitive or superior runtime to both deep neural networks and state-of-the-art scalable GP algorithms.

preprint2022arXiv

Light curve completion and forecasting using fast and scalable Gaussian processes (MuyGPs)

Temporal variations of apparent magnitude, called light curves, are observational statistics of interest captured by telescopes over long periods of time. Light curves afford the exploration of Space Domain Awareness (SDA) objectives such as object identification or pose estimation as latent variable inference problems. Ground-based observations from commercial off the shelf (COTS) cameras remain inexpensive compared to higher precision instruments, however, limited sensor availability combined with noisier observations can produce gappy time-series data that can be difficult to model. These external factors confound the automated exploitation of light curves, which makes light curve prediction and extrapolation a crucial problem for applications. Traditionally, image or time-series completion problems have been approached with diffusion-based or exemplar-based methods. More recently, Deep Neural Networks (DNNs) have become the tool of choice due to their empirical success at learning complex nonlinear embeddings. However, DNNs often require large training data that are not necessarily available when looking at unique features of a light curve of a single satellite. In this paper, we present a novel approach to predicting missing and future data points of light curves using Gaussian Processes (GPs). GPs are non-linear probabilistic models that infer posterior distributions over functions and naturally quantify uncertainty. However, the cubic scaling of GP inference and training is a major barrier to their adoption in applications. In particular, a single light curve can feature hundreds of thousands of observations, which is well beyond the practical realization limits of a conventional GP on a single machine. Consequently, we employ MuyGPs, a scalable framework for hyperparameter estimation of GP models that uses nearest neighbors sparsification and local cross-validation. MuyGPs...

preprint2020arXiv

Pass-efficient methods for compression of high-dimensional turbulent flow data

The future of high-performance computing, specifically on future Exascale computers, will presumably see memory capacity and bandwidth fail to keep pace with data generated, for instance, from massively parallel partial differential equation (PDE) systems. Current strategies proposed to address this bottleneck entail the omission of large fractions of data, as well as the incorporation of $\textit{in situ}$ compression algorithms to avoid overuse of memory. To ensure that post-processing operations are successful, this must be done in a way that a sufficiently accurate representation of the solution is stored. Moreover, in situations where the input/output system becomes a bottleneck in analysis, visualization, etc., or the execution of the PDE solver is expensive, the the number of passes made over the data must be minimized. In the interest of addressing this problem, this work focuses on the utility of pass-efficient, parallelizable, low-rank, matrix decomposition methods in compressing high-dimensional simulation data from turbulent flows. A particular emphasis is placed on using coarse representation of the data -- compatible with the PDE discretization grid -- to accelerate the construction of the low-rank factorization. This includes the presentation of a novel single-pass matrix decomposition algorithm for computing the so-called interpolative decomposition. The methods are described extensively and numerical experiments on two turbulent channel flow data are performed. In the first (unladen) channel flow case, compression factors exceeding $400$ are achieved while maintaining accuracy with respect to first- and second-order flow statistics. In the particle-laden case, compression factors of 100 are achieved and the compressed data is used to recover particle velocities.