Source author record

Alec Dunton

Alec Dunton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computational Engineering, Finance, and Science Machine Learning

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Task-parallel in-situ temporal compression of large-scale computational fluid dynamics data

Present day computational fluid dynamics simulations generate extremely large amounts of data, sometimes on the order of TB/s. Often, a significant fraction of this data is discarded because current storage systems are unable to keep pace. To address this, data compression algorithms can be applied to data arrays containing flow quantities of interest to reduce the overall amount of storage. Compression methods either exactly reconstruct the original dataset (lossless compression) or provide an approximate representation of the original dataset (lossy compression). The matrix column interpolative decomposition (ID) can be implemented as a type of lossy compression for data matrices that factors the original data matrix into a product of two smaller factor matrices. One of these matrices consists of a subset of the columns of the original data matrix, while the other is a coefficient matrix which approximates the columns of the original data matrix as linear combinations of the selected columns. Motivating this work is the observation that the structure of ID algorithms makes them a natural fit for the asynchronous nature of task-based parallelism; they are able to operate independently on sub-domains of the system of interest and, as a result, provide varied levels of compression. Using the task-based Legion programming model, a single-pass ID algorithm (SPID) for CFD applications is implemented. Performance studies, scalability, and the accuracy of the compression algorithms are presented for an analytical Taylor-Green vortex problem, followed by a large-scale implementation of a compressible Taylor-Green vortex using a high-order Navier-Stokes solver. In both cases, compression factors exceeding 100 are achieved with relative errors at or below 10e-3. Moreover, strong and weak scaling results demonstrate that introducing SPID to solvers leads to negligible increases in runtime.

preprint2020arXiv

Scaling Graph Clustering with Distributed Sketches

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the introduction of immense scale presents challenges to traditional methods. Spectral clustering in distributed memory, for example, requires hundreds of expensive bulk-synchronous communication rounds to compute an embedding of vertices to a few eigenvectors of a graph associated matrix. Furthermore, the whole computation may need to be repeated if the underlying graph changes some low percentage of edge updates. We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections. We show that our method produces embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. We also discuss the effects of stochastic block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.