Source author record

Randal Burns

Randal Burns appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Computer Vision Quantitative Methods Computational Engineering, Finance, and Science Graphics Neurons and Cognition Cryptography and Security Databases Emerging Technologies Machine Learning Mathematical Software Operating Systems physics.soc-ph Social and Information Networks Systems and Control

Catalog footprint

What is connected

20works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

clusterNOR: A NUMA-Optimized Clustering Framework

Clustering algorithms are iterative and have complex data access patterns that result in many small random memory accesses. The performance of parallel implementations suffer from synchronous barriers for each iteration and skewed workloads. We rethink the parallelization of clustering for modern non-uniform memory architectures (NUMA) to maximizes independent, asynchronous computation. We eliminate many barriers, reduce remote memory accesses, and maximize cache reuse. We implement the 'Clustering NUMA Optimized Routines' (clusterNOR) extensible parallel framework that provides algorithmic building blocks. The system is generic, we demonstrate nine modern clustering algorithms that have simple implementations. clusterNOR includes (i) in-memory, (ii) semi-external memory, and (iii) distributed memory execution, enabling computation for varying memory and hardware budgets. For algorithms that rely on Euclidean distance, clusterNOR defines an updated Elkan's triangle inequality pruning algorithm that uses asymptotically less memory so that it works on billion-point data sets. clusterNOR extends and expands the scope of the 'knor' library for k-means clustering by generalizing underlying principles, providing a uniform programming interface and expanding the scope to hierarchical and linear algebraic classes of algorithms. The compound effect of our optimizations is an order of magnitude improvement in speed over other state-of-the-art solutions, such as Spark's MLlib and Apple's Turi.

preprint2021arXiv

Supervised Dimensionality Reduction for Big Data

To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees.We introduce an approach, XOX, to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest ver-sion, "Linear Optimal Low-rank" projection (LOL), incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that LOL and its generalizations in the XOX framework lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of >150 million features, and several genomics datasets with>500,000 features, LOL outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.

preprint2020arXiv

Observations on Porting In-memory KV stores to Persistent Memory

Systems that require high-throughput and fault tolerance, such as key-value stores and databases, are looking to persistent memory to combine the performance of in-memory systems with the data-consistent fault-tolerance of nonvolatile stores. Persistent memory devices provide fast bytea-ddressable access to non-volatile memory. We analyze the design space when integrating persistent memory into in-memory key value stores and quantify performance tradeoffs between throughput, latency, and and recovery time. Previous works have explored many design choices, but did not quantify the tradeoffs. We implement persistent memory support in Redis and Memcached, adapting the data structures of each to work in two modes: (1) with all data in persistent memory and (2) a hybrid mode that uses persistent memory for key/value data and non-volatile memory for indexing and metadata. Our experience reveals three actionable design principles that hold in Redis and Memcached, despite their very different implementations. We conclude that the hybrid design increases throughput and decreases latency at a minor cost in recovery time and code complexity

preprint2016arXiv

An SSD-based eigensolver for spectral analysis on billion-node graphs

Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems. In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi eigensolvers to SSDs, to compute eigenvalues of a graph with hundreds of millions or even billions of vertices in a single machine. FlashEigen performs sparse matrix multiplication in a semi-external memory fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in memory. We store the entire vector subspace on SSDs and reduce I/O to improve performance through caching the most recent dense matrix. Our result shows that FlashEigen is able to achieve 40%-60% performance of its in-memory implementation and has performance comparable to the Anasazi eigensolvers on a machine with 48 CPU cores. Furthermore, it is capable of scaling to a graph with 3.4 billion vertices and 129 billion edges. It takes about four hours to compute eight eigenvalues of the billion-node graph using 120 GB memory.

preprint2016arXiv

Grand Challenges for Global Brain Sciences

The next grand challenges for society and science are in the brain sciences. A collection of 60+ scientists from around the world, together with 10+ observers from national, private, and foundations, spent two days together discussing the top challenges that we could solve as a global community in the next decade. We eventually settled on three challenges, spanning anatomy, physiology, and medicine. Addressing all three challenges requires novel computational infrastructure. The group proposed the advent of The International Brain Station (TIBS), to address these challenges, and launch brain sciences to the next level of understanding.

preprint2016arXiv

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion; i.e., we keep the sparse matrix on commodity SSDs and dense matrices in memory. Our SEM-SpMM incorporates many in-memory optimizations for large power-law graphs. It outperforms the in-memory implementations of Trilinos and Intel MKL and scales to billion-node graphs, far beyond the limitations of memory. Furthermore, on a single large parallel machine, our SEM-SpMM operates as fast as the distributed implementations of Trilinos using five times as much processing power. We also run our implementation in memory (IM-SpMM) to quantify the overhead of keeping data on SSDs. SEM-SpMM achieves almost 100% performance of IM-SpMM on graphs when the dense matrix has more than four columns; it achieves at least 65% performance of IM-SpMM on all inputs. We apply our SpMM to three important data analysis tasks--PageRank, eigensolving, and non-negative matrix factorization--and show that our SEM implementations significantly advance the state of the art.

preprint2015arXiv

Active Community Detection in Massive Graphs

A canonical problem in graph mining is the detection of dense communities. This problem is exacerbated for a graph with a large order and size -- the number of vertices and edges -- as many community detection algorithms scale poorly. In this work we propose a novel framework for detecting active communities that consist of the most active vertices in massive graphs. The framework is applicable to graphs having billions of vertices and hundreds of billions of edges. Our framework utilizes a parallelizable trimming algorithm based on a locality statistic to filter out inactive vertices, and then clusters the remaining active vertices via spectral decomposition on their similarity matrix. We demonstrate the validity of our method with synthetic Stochastic Block Model graphs, using Adjusted Rand Index as the performance metric. We further demonstrate its practicality and efficiency on a most recent real-world Hyperlink Web graph consisting of over 3.5 billion vertices and 128 billion edges.

preprint2015arXiv

An Automated Images-to-Graphs Framework for High Resolution Connectomics

Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by manually tracing each neuronal process at this scale is unmanageable, and therefore researchers are developing automated image processing modules. Thus far, state-of-the-art algorithms focus only on the solution to a particular task (e.g., neuron segmentation or synapse identification). In this manuscript we present the first fully automated images-to-graphs pipeline (i.e., a pipeline that begins with an imaged volume of neural tissue and produces a brain graph without any human interaction). To evaluate overall performance and select the best parameters and methods, we also develop a metric to assess the quality of the output graphs. We evaluate a set of algorithms and parameters, searching possible operating points to identify the best available brain graph for our assessment metric. Finally, we deploy a reference end-to-end version of the pipeline on a large, publicly available data set. This provides a baseline result and framework for community analysis and future algorithm development and testing. All code and data derivatives have been made publicly available toward eventually unlocking new biofidelic computational primitives and understanding of neuropathologies.

preprint2015arXiv

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism. Our semi-external memory graph engine called FlashGraph stores vertex state in memory and edge lists on SSDs. It hides latency by overlapping computation with I/O. To save I/O bandwidth, FlashGraph only accesses edge lists requested by applications from SSDs; to increase I/O throughput and reduce CPU overhead for I/O, it conservatively merges I/O requests. These designs maximize performance for applications with different I/O characteristics. FlashGraph exposes a general and flexible vertex-centric programming interface that can express a wide variety of graph algorithms and their optimizations. We demonstrate that FlashGraph in semi-external memory performs many algorithms with performance up to 80% of its in-memory implementation and significantly outperforms PowerGraph, a popular distributed in-memory graph engine.

preprint2015arXiv

Gradient-Domain Fusion for Color Correction in Large EM Image Stacks

We propose a new gradient-domain technique for processing registered EM image stacks to remove inter-image discontinuities while preserving intra-image detail. To this end, we process the image stack by first performing anisotropic smoothing along the slice axis and then solving a Poisson equation within each slice to re-introduce the detail. The final image stack is continuous across the slice axis and maintains sharp details within each slice. Adapting existing out-of-core techniques for solving the linear system, we describe a parallel algorithm with time complexity that is linear in the size of the data and space complexity that is sub-linear, allowing us to process datasets as large as five teravoxels with a 600 MB memory footprint.

preprint2015arXiv

Optimize Unsynchronized Garbage Collection in an SSD Array

Solid state disks (SSDs) have advanced to outperform traditional hard drives significantly in both random reads and writes. However, heavy random writes trigger fre- quent garbage collection and decrease the performance of SSDs. In an SSD array, garbage collection of individ- ual SSDs is not synchronized, leading to underutilization of some of the SSDs. We propose a software solution to tackle the unsyn- chronized garbage collection in an SSD array installed in a host bus adaptor (HBA), where individual SSDs are exposed to an operating system. We maintain a long I/O queue for each SSD and flush dirty pages intelligently to fill the long I/O queues so that we hide the performance imbalance among SSDs even when there are few parallel application writes. We further define a policy of select- ing dirty pages to flush and a policy of taking out stale flush requests to reduce the amount of data written to SSDs. We evaluate our solution in a real system. Experi- ments show that our solution fully utilizes all SSDs in an array under random write-heavy workloads. It improves I/O throughput by up to 62% under random workloads of mixed reads and writes when SSDs are under active garbage collection. It causes little extra data writeback and increases the cache hit rate.

preprint2015arXiv

VESICLE: Volumetric Evaluation of Synaptic Interfaces using Computer vision at Large Scale

An open challenge problem at the forefront of modern neuroscience is to obtain a comprehensive mapping of the neural pathways that underlie human brain function; an enhanced understanding of the wiring diagram of the brain promises to lead to new breakthroughs in diagnosing and treating neurological disorders. Inferring brain structure from image data, such as that obtained via electron microscopy (EM), entails solving the problem of identifying biological structures in large data volumes. Synapses, which are a key communication structure in the brain, are particularly difficult to detect due to their small size and limited contrast. Prior work in automated synapse detection has relied upon time-intensive biological preparations (post-staining, isotropic slice thicknesses) in order to simplify the problem. This paper presents VESICLE, the first known approach designed for mammalian synapse detection in anisotropic, non-post-stained data. Our methods explicitly leverage biological context, and the results exceed existing synapse detection methods in terms of accuracy and scalability. We provide two different approaches - one a deep learning classifier (VESICLE-CNN) and one a lightweight Random Forest approach (VESICLE-RF) to offer alternatives in the performance-scalability space. Addressing this synapse detection challenge enables the analysis of high-throughput imaging data soon expected to reach petabytes of data, and provide tools for more rapid estimation of brain-graphs. Finally, to facilitate community efforts, we developed tools for large-scale object detection, and demonstrated this framework to find $\approx$ 50,000 synapses in 60,000 $μm ^3$ (220 GB on disk) of electron microscopy data.

preprint2014arXiv

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes

In this paper, we present a new pipeline which automatically identifies and annotates axoplasmic reticula, which are small subcellular structures present only in axons. We run our algorithm on the Kasthuri11 dataset, which was color corrected using gradient-domain techniques to adjust contrast. We use a bilateral filter to smooth out the noise in this data while preserving edges, which highlights axoplasmic reticula. These axoplasmic reticula are then annotated using a morphological region growing algorithm. Additionally, we perform Laplacian sharpening on the bilaterally filtered data to enhance edges, and repeat the morphological region growing algorithm to annotate more axoplasmic reticula. We track our annotations through the slices to improve precision, and to create long objects to aid in segment merging. This method annotates axoplasmic reticula with high precision. Our algorithm can easily be adapted to annotate axoplasmic reticula in different sets of brain data by changing a few thresholds. The contribution of this work is the introduction of a straightforward and robust pipeline which annotates axoplasmic reticula with high precision, contributing towards advancements in automatic feature annotations in neural EM data.

preprint2014arXiv

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes using High-Resolution Neural EM Data

Accurately estimating the wiring diagram of a brain, known as a connectome, at an ultrastructure level is an open research problem. Specifically, precisely tracking neural processes is difficult, especially across many image slices. Here, we propose a novel method to automatically identify and annotate small subcellular structures present in axons, known as axoplasmic reticula, through a 3D volume of high-resolution neural electron microscopy data. Our method produces high precision annotations, which can help improve automatic segmentation by using our results as seeds for segmentation, and as cues to aid segment merging.

preprint2013arXiv

Computing Scalable Multivariate Glocal Invariants of Large (Brain-) Graphs

Graphs are quickly emerging as a leading abstraction for the representation of data. One important application domain originates from an emerging discipline called "connectomics". Connectomics studies the brain as a graph; vertices correspond to neurons (or collections thereof) and edges correspond to structural or functional connections between them. To explore the variability of connectomes---to address both basic science questions regarding the structure of the brain, and medical health questions about psychiatry and neurology---one can study the topological properties of these brain-graphs. We define multivariate glocal graph invariants: these are features of the graph that capture various local and global topological properties of the graphs. We show that the collection of features can collectively be computed via a combination of daisy-chaining, sparse matrix representation and computations, and efficient approximations. Our custom open-source Python package serves as a back-end to a Web-service that we have created to enable researchers to upload graphs, and download the corresponding invariants in a number of different formats. Moreover, we built this package to support distributed processing on multicore machines. This is therefore an enabling technology for network science, lowering the barrier of entry by providing tools to biologists and analysts who otherwise lack these capabilities. As a demonstration, we run our code on 120 brain-graphs, each with approximately 16M vertices and up to 90M edges.

preprint2013arXiv

Gradient-Domain Processing for Large EM Image Stacks

We propose a new gradient-domain technique for processing registered EM image stacks to remove the inter-image discontinuities while preserving intra-image detail. To this end, we process the image stack by first performing anisotropic diffusion to smooth the data along the slice axis and then solving a screened-Poisson equation within each slice to re-introduce the detail. The final image stack is both continuous across the slice axis (facilitating the tracking of information between slices) and maintains sharp details within each slice (supporting automatic feature detection). To support this editing, we describe the implementation of the first multigrid solver designed for efficient gradient domain processing of large, out-of-core, voxel grids.

preprint2013arXiv

MIGRAINE: MRI Graph Reliability Analysis and Inference for Connectomics

Currently, connectomes (e.g., functional or structural brain graphs) can be estimated in humans at $\approx 1~mm^3$ scale using a combination of diffusion weighted magnetic resonance imaging, functional magnetic resonance imaging and structural magnetic resonance imaging scans. This manuscript summarizes a novel, scalable implementation of open-source algorithms to rapidly estimate magnetic resonance connectomes, using both anatomical regions of interest (ROIs) and voxel-size vertices. To assess the reliability of our pipeline, we develop a novel nonparametric non-Euclidean reliability metric. Here we provide an overview of the methods used, demonstrate our implementation, and discuss available user extensions. We conclude with results showing the efficacy and reliability of the pipeline over previous state-of-the-art.

preprint2013arXiv

The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes---neural connectivity maps of the brain---using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at http://openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems---reads to parallel disk arrays and writes to solid-state storage---to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effectiveness of spatial data organization.

preprint2011arXiv

The Life and Death of Unwanted Bits: Towards Proactive Waste Data Management in Digital Ecosystems

Our everyday data processing activities create massive amounts of data. Like physical waste and trash, unwanted and unused data also pollutes the digital environment by degrading the performance and capacity of storage systems and requiring costly disposal. In this paper, we propose using the lessons from real life waste management in handling waste data. We show the impact of waste data on the performance and operational costs of our computing systems. To allow better waste data management, we define a waste hierarchy for digital objects and provide insights into how to identify and categorize waste data. Finally, we introduce novel ways of reusing, reducing, and recycling data and software to minimize the impact of data wastage

preprint2011arXiv

Where Have You Been? Secure Location Provenance for Mobile Devices

With the advent of mobile computing, location-based services have recently gained popularity. Many applications use the location provenance of users, i.e., the chronological history of the users' location for purposes ranging from access control, authentication, information sharing, and evaluation of policies. However, location provenance is subject to tampering and collusion attacks by malicious users. In this paper, we examine the secure location provenance problem. We introduce a witness-endorsed scheme for generating collusion-resistant location proofs. We also describe two efficient and privacy-preserving schemes for protecting the integrity of the chronological order of location proofs. These schemes, based on hash chains and Bloom filters respectively, allow users to prove the order of any arbitrary subsequence of their location history to auditors. Finally, we present experimental results from our proof-of-concept implementation on the Android platform and show that our schemes are practical in today's mobile devices.

Randal Burns

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

clusterNOR: A NUMA-Optimized Clustering Framework

Supervised Dimensionality Reduction for Big Data

Observations on Porting In-memory KV stores to Persistent Memory

An SSD-based eigensolver for spectral analysis on billion-node graphs

Grand Challenges for Global Brain Sciences

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Active Community Detection in Massive Graphs

An Automated Images-to-Graphs Framework for High Resolution Connectomics

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

Gradient-Domain Fusion for Color Correction in Large EM Image Stacks

Optimize Unsynchronized Garbage Collection in an SSD Array

VESICLE: Volumetric Evaluation of Synaptic Interfaces using Computer vision at Large Scale

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes

Automatic Annotation of Axoplasmic Reticula in Pursuit of Connectomes using High-Resolution Neural EM Data

Computing Scalable Multivariate Glocal Invariants of Large (Brain-) Graphs

Gradient-Domain Processing for Large EM Image Stacks

MIGRAINE: MRI Graph Reliability Analysis and Inference for Connectomics

The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

The Life and Death of Unwanted Bits: Towards Proactive Waste Data Management in Digital Ecosystems

Where Have You Been? Secure Location Provenance for Mobile Devices