Source author record

Jeremy Kepner

Jeremy Kepner appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

54works

29topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Improving the Graph Challenge Reference Implementation

The MIT/IEEE/Amazon Graph Challenge provides a venue for individuals and teams to showcase new innovations in large-scale graph and sparse data analysis. The Anonymized Network Sensing Graph Challenge processes over 100 billion network packets to construct privacy-preserving traffic matrices, with a GraphBLAS reference implementation demonstrating how hypersparse matrices can be applied to this problem. This work presents a refactoring and benchmarking of a section of the reference code to improve clarity, adaptability, and performance. The original Python implementation spanning approximately 1000 lines across 3 files has been streamlined to 325 lines across two focused modules, achieving a 67% reduction in code size while maintaining full functionality. Using pMatlab and pPython distributed array programming libraries, the addition of parallel maps allowed for parallel benchmarking of the data. Scalable performance is demonstrated for large-scale summation and analysis of traffic matrices. The resulting implementation increases the potential impact of the Graph Challenge by providing a clear and efficient foundation for participants.

preprint2023arXiv

pPython Performance Study

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py.

preprint2022arXiv

Developing a Series of AI Challenges for the United States Department of the Air Force

Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are developing public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances.

preprint2022arXiv

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multitemporal spatial analyses are then performed on each subrange to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).

preprint2022arXiv

Temporal Correlation of Internet Observatories and Outposts

The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70\% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month.

preprint2022arXiv

The MIT Supercloud Workload Classification Challenge

High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu.

preprint2022arXiv

Zero Botnets: An Observe-Pursue-Counter Approach

Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the presence of botnets on the Internet, with the aspirational target of zero, is a powerful vision for galvanizing policy action. Setting a global goal, encouraging international cooperation, creating incentives for improving networks, and supporting entities for botnet takedowns are among several policies that could advance this goal. These policies raise significant questions regarding proper authorities/access that cannot be answered in the abstract. Systems analysis has been widely used in other domains to achieve sufficient detail to enable these questions to be dealt with in concrete terms. Defeating botnets using an observe-pursue-counter architecture is analyzed, the technical feasibility is affirmed, and the authorities/access questions are significantly narrowed. Recommended next steps include: supporting the international botnet takedown community, expanding network observatories, enhancing the underlying network science at scale, conducting detailed systems analysis, and developing appropriate policy frameworks.

preprint2021arXiv

3D Real-Time Supercomputer Monitoring

Supercomputers are complex systems producing vast quantities of performance data from multiple sources and of varying types. Performance data from each of the thousands of nodes in a supercomputer tracks multiple forms of storage, memory, networks, processors, and accelerators. Optimization of application performance is critical for cost effective usage of a supercomputer and requires efficient methods for effectively viewing performance data. The combination of supercomputing analytics and 3D gaming visualization enables real-time processing and visual data display of massive amounts of information that humans can process quickly with little training. Our system fully utilizes the capabilities of modern 3D gaming environments to create novel representations of computing hardware which intuitively represent the physical attributes of the supercomputer while displaying real-time alerts and component utilization. This system allows operators to quickly assess how the supercomputer is being used, gives users visibility into the resources they are consuming, and provides instructors new ways to interactively teach the computing architecture concepts necessary for efficient computing

preprint2020arXiv

75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

preprint2020arXiv

Accuracy and Performance Comparison of Video Action Recognition Approaches

Over the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. This article provides a direct comparison between fourteen off-the-shelf and state-of-the-art models by ensuring consistency in these training characteristics in order to provide readers with a meaningful comparison across different types of video action recognition algorithms. Accuracy of the models is evaluated using standard Top-1 and Top-5 accuracy metrics in addition to a proposed new accuracy metric. Additionally, we compare computational performance of distributed training from two to sixty-four GPUs on a state-of-the-art HPC system.

preprint2020arXiv

AI Data Wrangling with Associative Arrays

The AI revolution is data driven. AI "data wrangling" is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data representations supporting the many query and analysis steps found in an AI pipeline. Rigorous mathematical representations of these data enables data translation and analysis optimization within and across steps. Associative array algebra provides a mathematical foundation that naturally describes the tabular structures and set mathematics that are the basis of databases. Likewise, the matrix operations and corresponding inference/training calculations used by neural networks are also well described by associative arrays. More surprisingly, a general denormalized form of hierarchical formats, such as XML and JSON, can be readily constructed. Finally, pivot tables, which are among the most widely used data analysis tools, naturally emerge from associative array constructors. A common foundation in associative arrays provides interoperability guarantees, proving that their operations are linear systems with rigorous mathematical properties, such as, associativity, commutativity, and distributivity that are critical to reordering optimizations.

preprint2020arXiv

DBOS: A Proposal for a Data-Centric Operating System

Current operating systems are complex systems that were designed before today's computing environments. This makes it difficult for them to meet the scalability, heterogeneity, availability, and security challenges in current cloud and parallel computing environments. To address these problems, we propose a radically new OS design based on data-centric architecture: all operating system state should be represented uniformly as database tables, and operations on this state should be made via queries from otherwise stateless tasks. This design makes it easy to scale and evolve the OS without whole-system refactoring, inspect and debug system state, upgrade components without downtime, manage decisions using machine learning, and implement sophisticated security features. We discuss how a database OS (DBOS) can improve the programmability and performance of many of today's most important applications and propose a plan for the development of a DBOS proof of concept.

preprint2020arXiv

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

preprint2020arXiv

Survey of Machine Learning Accelerators

New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of of AI accelerators and processors from last year's IEEE-HPEC paper. This paper collects and summarizes the current accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. This year, there are many more announced accelerators that are implemented with many more architectures and technologies from vector engines, dataflow engines, neuromorphic designs, flash-based analog memory processing, and photonic-based processing.

preprint2020arXiv

Technical Report: Developing a Working Data Hub

Data forms a key component of any enterprise. The need for high quality and easy access to data is further amplified by organizations wishing to leverage machine learning or artificial intelligence for their operations. To this end, many organizations are building resources for managing heterogenous data, providing end-users with an organization wide view of available data, and acting as a centralized repository for data owned/collected by an organization. Very broadly, we refer to these class of techniques as a "data hub." While there is no clear definition of what constitutes a data hub, some of the key characteristics include: data catalog; links to data sets or owners of data sets or centralized data repository; basic ability to serve / visualize data sets; access control policies that ensure secure data access and respects policies of data owners; and computing capabilities tied with data hub infrastructure. Of course, developing such a data hub entails numerous challenges. This document provides background in databases, data management and outlines best practices and recommendations for developing and deploying a working data hub.

preprint2016arXiv

Associative Array Model of SQL, NoSQL, and NewSQL Databases

The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL. These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries.

preprint2016arXiv

Benchmarking SciDB Data Import on HPC Systems

SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of massive datasets compared to the traditional approaches of reading volumetric data from individual files. This work describes the D4M and SciDB tools we developed and presents the initial performance results. This performance was achieved by using parallel inserts, a in-database merging of arrays as well as supercomputing techniques, such as distributed arrays and single-program-multiple-data programming.

preprint2016arXiv

Benchmarking the Graphulo Processing Framework

Graph algorithms have wide applicablity to a variety of domains and are often used on massive datasets. Recent standardization efforts such as the GraphBLAS specify a set of key computational kernels that hardware and software developers can adhere to. Graphulo is a processing framework that enables GraphBLAS kernels in the Apache Accumulo database. In our previous work, we have demonstrated a core Graphulo operation called \textit{TableMult} that performs large-scale multiplication operations of database tables. In this article, we present the results of scaling the Graphulo engine to larger problems and scalablity when a greater number of resources is used. Specifically, we present two experiments that demonstrate Graphulo scaling performance is linear with the number of available resources. The first experiment demonstrates cluster processing rates through Graphulo's TableMult operator on two large graphs, scaled between $2^{17}$ and $2^{19}$ vertices. The second experiment uses TableMult to extract a random set of rows from a large graph ($2^{19}$ nodes) to simulate a cued graph analytic. These benchmarking results are of relevance to Graphulo users who wish to apply Graphulo to their graph problems.

preprint2016arXiv

Enhancing HPC Security with a User-Based Firewall

HPC systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network must provide their own authentication. Many methods have been employed to perform this authentication. However, support for all of these methods requires the HPC application developer to include support and the user to configure and enable these services. The user-based firewall capability we have prototyped enables a set of rules governing connections across the HPC internal network to be put into place using Linux netfilter. By using an operating system-level capability, the system is not reliant on any developer or user actions to enable security. The rules we have chosen and implemented are crafted to not impact the vast majority of users and be completely invisible to them.

preprint2016arXiv

From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database

Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside a database versus an external system---in short, that memory requirements and relative I/O are critical factors.

preprint2016arXiv

Julia Implementation of the Dynamic Distributed Dimensional Data Model

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation.

preprint2016arXiv

LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming capability in one line of code. LLMapReduce supports all programming languages and many schedulers. LLMapReduce can work with any application without the need to modify the application. Furthermore, LLMapReduce can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel programming model. These features allow users to reduce the computational overhead by more than 10x compared to standard map-reduce for certain applications. LLMapReduce is widely used by hundreds of users at MIT. Currently LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF.

preprint2016arXiv

Novel Graph Processor Architecture, Prototype System, and Results

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's graph-processor architecture represents a rethinking of parallel architectures for graph problems. Our processor utilizes innovations that include a sparse matrix-based graph instruction set, a cacheless memory system, accelerator-based architecture, a systolic sorter, high-bandwidth multi-dimensional toroidal communication network, and randomized communications. A field-programmable gate array (FPGA) prototype of the new graph processor has been developed with significant performance enhancement over conventional processors in graph computational throughput.

preprint2016arXiv

PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these challenges for decades and developed methodologies for creating rigorous scalable benchmarks (e.g., HPC Challenge). The proposed PageRank pipeline benchmark employs supercomputing benchmarking methodologies to create a scalable benchmark that is reflective of many real-world big data processing systems. The PageRank pipeline benchmark builds on existing prior scalable benchmarks (Graph500, Sort, and PageRank) to create a holistic benchmark with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. The linear algebraic nature of PageRank makes it well suited to being implemented using the GraphBLAS standard. The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed PageRank pipeline benchmark is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance has been measured.

preprint2016arXiv

Scalability of VM Provisioning Systems

Virtual machines and virtualized hardware have been around for over half a century. The commoditization of the x86 platform and its rapidly growing hardware capabilities have led to recent exponential growth in the use of virtualization both in the enterprise and high performance computing (HPC). The startup time of a virtualized environment is a key performance metric for high performance computing in which the runtime of any individual task is typically much shorter than the lifetime of a virtualized service in an enterprise context. In this paper, a methodology for accurately measuring the startup performance on an HPC system is described. The startup performance overhead of three of the most mature, widely deployed cloud management frameworks (OpenStack, OpenNebula, and Eucalyptus) is measured to determine their suitability for workloads typically seen in an HPC environment. A 10x performance difference is observed between the fastest (Eucalyptus) and the slowest (OpenNebula) framework. This time difference is primarily due to delays in waiting on networking in the cloud-init portion of the startup. The methodology and measurements presented should facilitate the optimization of startup across a variety of virtualization environments.

preprint2016arXiv

The BigDAWG Architecture

BigDAWG is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands of information and a middleware that provides a uniform multi-island interface. In this article, we describe the current architecture of BigDAWG, its application on the MIMIC II medical dataset, and our plans for the mechanics of cross-system queries. During the presentation, we will also deliver a brief demonstration of the current version of BigDAWG.

preprint2016arXiv

The BigDAWG Polystore System and Architecture

Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi--island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.

preprint2015arXiv

Algebraic Conditions for Generating Accurate Adjacency Arrays

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. Associative arrays unify and simplify these different approaches into a common two-dimensional view of data. Graph construction, a fundamental operation in the data processing pipeline, is typically done by multiplying the incidence array representations of a graph, $\mathbf{E}_\mathrm{in}$ and $\mathbf{E}_\mathrm{out}$, to produce an adjacency matrix of the graph that can be processed with a variety of machine learning clustering techniques. This work focuses on establishing the mathematical criteria to ensure that the matrix product $\mathbf{E}_\mathrm{out}^\intercal\mathbf{E}_\mathrm{in}$ is the adjacency array of the graph. It will then be shown that these criteria are also necessary and sufficient for the remaining nonzero product of incidence arrays, $\mathbf{E}_\mathrm{in}^\intercal\mathbf{E}_\mathrm{out}$ to be the adjacency matrices of the reversed graph. Algebraic structures that comply with the criteria will be identified and discussed.

preprint2015arXiv

Associative Arrays: Unified Mathematics for Spreadsheets, Databases, Matrices, and Graphs

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. The common theme amongst these views is the need to store and operate on data as whole sets instead of as individual data elements. This work describes a common mathematical representation of these data sets (associative arrays) that applies across a wide range of applications and technologies. Associative arrays unify and simplify these different approaches for representing and manipulating data into common two-dimensional view of data. Specifically, associative arrays (1) reduce the effort required to pass data between steps in a data processing system, (2) allow steps to be interchanged with full confidence that the results will be unchanged, and (3) make it possible to recognize when steps can be simplified or eliminated. Most database system naturally support associative arrays via their tabular interfaces. The D4M implementation of associative arrays uses this feature to provide a common interface across SQL, NoSQL, and NewSQL databases.

preprint2015arXiv

Big Data Strategies for Data Center Infrastructure Management Using a 3D Gaming Platform

High Performance Computing (HPC) is intrinsically linked to effective Data Center Infrastructure Management (DCIM). Cloud services and HPC have become key components in Department of Defense and corporate Information Technology competitive strategies in the global and commercial spaces. As a result, the reliance on consistent, reliable Data Center space is more critical than ever. The costs and complexity of providing quality DCIM are constantly being tested and evaluated by the United States Government and companies such as Google, Microsoft and Facebook. This paper will demonstrate a system where Big Data strategies and 3D gaming technology is leveraged to successfully monitor and analyze multiple HPC systems and a lights-out modular HP EcoPOD 240a Data Center on a singular platform. Big Data technology and a 3D gaming platform enables the relative real time monitoring of 5000 environmental sensors, more than 3500 IT data points and display visual analytics of the overall operating condition of the Data Center from a command center over 100 miles away. In addition, the Big Data model allows for in depth analysis of historical trends and conditions to optimize operations achieving even greater efficiencies and reliability.

preprint2015arXiv

Computing on Masked Data to improve the Security of Big Data

Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such untrusted servers or databases. Advances in cryptographic techniques and database technologies provide the necessary security functionality but rely on a computational model in which the cloud is used solely for storage and retrieval. Much of big data computation and analytics make use of signal processing fundamentals for computation. As the trend of moving data storage and computation to the cloud increases, homeland security missions should understand the impact of security on key signal processing kernels such as correlation or thresholding. In this article, we propose a tool called Computing on Masked Data (CMD), which combines advances in database technologies and cryptographic tools to provide a low overhead mechanism to offload certain mathematical operations securely to the cloud. This article describes the design and development of the CMD tool.

preprint2015arXiv

Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.

preprint2015arXiv

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we implement GraphBLAS sparse matrix multiplication server-side by leveraging Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.

preprint2015arXiv

Graphulo: Linear Algebra Graph Kernels for NoSQL Databases

Big data and the Internet of Things era continue to challenge computational systems. Several technology solutions such as NoSQL databases have been developed to deal with this challenge. In order to generate meaningful results from large datasets, analysts often use a graph representation which provides an intuitive way to work with the data. Graph vertices can represent users and events, and edges can represent the relationship between vertices. Graph algorithms are used to extract meaningful information from these very large graphs. At MIT, the Graphulo initiative is an effort to perform graph algorithms directly in NoSQL databases such as Apache Accumulo or SciDB, which have an inherently sparse data storage scheme. Sparse matrix operations have a history of efficient implementations and the Graph Basic Linear Algebra Subprogram (GraphBLAS) community has developed a set of key kernels that can be used to develop efficient linear algebra operations. However, in order to use the GraphBLAS kernels, it is important that common graph algorithms be recast using the linear algebra building blocks. In this article, we look at common classes of graph algorithms and recast them into linear algebra operations using the GraphBLAS building blocks.

preprint2015arXiv

Improving Big Data Visual Analytics with Interactive Virtual Reality

For decades, the growth and volume of digital data collection has made it challenging to digest large volumes of information and extract underlying structure. Coined 'Big Data', massive amounts of information has quite often been gathered inconsistently (e.g from many sources, of various forms, at different rates, etc.). These factors impede the practices of not only processing data, but also analyzing and displaying it in an efficient manner to the user. Many efforts have been completed in the data mining and visual analytics community to create effective ways to further improve analysis and achieve the knowledge desired for better understanding. Our approach for improved big data visual analytics is two-fold, focusing on both visualization and interaction. Given geo-tagged information, we are exploring the benefits of visualizing datasets in the original geospatial domain by utilizing a virtual reality platform. After running proven analytics on the data, we intend to represent the information in a more realistic 3D setting, where analysts can achieve an enhanced situational awareness and rely on familiar perceptions to draw in-depth conclusions on the dataset. In addition, developing a human-computer interface that responds to natural user actions and inputs creates a more intuitive environment. Tasks can be performed to manipulate the dataset and allow users to dive deeper upon request, adhering to desired demands and intentions. Due to the volume and popularity of social media, we developed a 3D tool visualizing Twitter on MIT's campus for analysis. Utilizing emerging technologies of today to create a fully immersive tool that promotes visualization and interaction can help ease the process of understanding and representing big data.

preprint2015arXiv

Large Enforced Sparse Non-Negative Matrix Factorization

Non-negative matrix factorization (NMF) is a common method for generating topic models from text data. NMF is widely accepted for producing good results despite its relative simplicity of implementation and ease of computation. One challenge with applying NMF to large datasets is that intermediate matrix products often become dense, stressing the memory and compute elements of a system. In this article, we investigate a simple but powerful modification of a common NMF algorithm that enforces the generation of sparse intermediate and output matrices. This method enables the application of NMF to large datasets through improved memory and compute performance. Further, we demonstrate empirically that this method of enforcing sparsity in the NMF either preserves or improves both the accuracy of the resulting topic model and the convergence rate of the underlying algorithm.

preprint2015arXiv

Lustre, Hadoop, Accumulo

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10,000x lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.

preprint2015arXiv

Parallel Vectorized Algebraic AES in MATLAB for Rapid Prototyping of Encrypted Sensor Processing Algorithms and Database Analytics

The increasing use of networked sensor systems and networked databases has led to an increased interest in incorporating encryption directly into sensor algorithms and database analytics. MATLAB is the dominant tool for rapid prototyping of sensor algorithms and has extensive database analytics capabilities. The advent of high level and high performance Galois Field mathematical environments allows encryption algorithms to be expressed succinctly and efficiently. This work leverages the Galois Field primitives found the MATLAB Communication Toolbox to implement a mode of the Advanced Encrypted Standard (AES) based on first principals mathematics. The resulting implementation requires 100x less code than standard AES implementations and delivers speed that is effective for many design purposes. The parallel version achieves speed comparable to native OpenSSL on a single node and is sufficient for real-time prototyping of many sensor processing algorithms and database analytics.

preprint2015arXiv

Using a Big Data Database to Identify Pathogens in Protein Data Space

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors / false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate goal of rapid pathogen identification in patient samples. Our approach uses the abilities of a big data databases to hold large sparse associative array representations of genetic data to extract statistical patterns about the data that can be used in a variety of ways to improve identification algorithms.

preprint2014arXiv

Achieving 100,000,000 database inserts per second using Accumulo and D4M

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.

preprint2014arXiv

Big Data Dimensional Analysis

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data Analysis (DDA) is a proposed technique that allows big data analysts to quickly understand the overall structure of a big dataset, determine anomalies. DDA exploits structures that exist in a wide class of data to quickly determine the nature of the data and its statical anomalies. DDA leverages existing schemas that are employed in big data databases today. This paper presents DDA, applies it to a number of data sets, and measures its performance. The overhead of DDA is low and can be applied to existing big data systems without greatly impacting their computing requirements.

preprint2014arXiv

Computing on Masked Data: a High Performance Method for Improving Big Data Veracity

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.

preprint2014arXiv

D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed Dimensional Data Model (D4M)[http://d4m.mit.edu] provides a uniform mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases. For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics, scientific citation, free text, and social media data. The D4M 2.0 Schema is simple, requires minimal parsing, and achieves the highest published Accumulo ingest rates. The benefits of the D4M 2.0 Schema are independent of the D4M interface. Any interface to Accumulo can achieve these benefits by using the D4M 2.0 Schema

preprint2014arXiv

Genetic Sequence Matching Using D4M Big Data Approaches

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.

preprint2014arXiv

Percolation Model of Insider Threats to Assess the Optimum Number of Rules

Rules, regulations, and policies are the basis of civilized society and are used to coordinate the activities of individuals who have a variety of goals and purposes. History has taught that over-regulation (too many rules) makes it difficult to compete and under-regulation (too few rules) can lead to crisis. This implies an optimal number of rules that avoids these two extremes. Rules create boundaries that define the latitude an individual has to perform their activities. This paper creates a Toy Model of a work environment and examines it with respect to the latitude provided to a normal individual and the latitude provided to an insider threat. Simulations with the Toy Model illustrate four regimes with respect to an insider threat: under-regulated, possibly optimal, tipping-point, and over-regulated. These regimes depend up the number of rules (N) and the minimum latitude (Lmin) required by a normal individual to carry out their activities. The Toy Model is then mapped onto the standard 1D Percolation Model from theoretical physics and the same behavior is observed. This allows the Toy Model to be generalized to a wide array of more complex models that have been well studied by the theoretical physics community and also show the same behavior. Finally, by estimating N and Lmin it should be possible to determine the regime of any particular environment.

preprint2014arXiv

Standards for Graph Algorithm Primitives

It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position paper defining the problem and announcing our intention to launch an open effort to define this standard.

preprint2006arXiv

pMatlab Parallel Matlab Library

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with ~1,000,000 users worldwide. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. pMatlab (www.ll.mit.edu/pMatlab) provides this capability by implementing Parallel Global Array Semantics (PGAS) using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a map construct. Communication operations between distributed arrays are abstracted away from the user and pMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library (www.ll.mit.edu/MatlabMPI) and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOSX, and SunOS. Performance is validated by implementing the HPC Challenge benchmark suite and comparing pMatlab performance with the equivalent C+MPI codes. These results indicate that pMatlab can often achieve comparable performance to C+MPI at usually 1/10th the code size. Finally, we present implementation data collected from a sample of 10 real pMatlab applications drawn from the ~100 users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to a well performing pMatlab code in about 3 hours while changing less than 1% of their code.

preprint2003arXiv

MatlabMPI

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI ``look and feel'' on top of standard Matlab file I/O, resulting in an extremely compact (~250 lines of code) and ``pure'' implementation which runs anywhere Matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributed memory parallel computers (e.g. Sun, SGI, HP, IBM, Linux and MacOSX). MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~300 using 304 CPUs and ~15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code, illustrating the high productivity of this approach. MatlabMPI is available for download on the web (www.ll.mit.edu/MatlabMPI).

preprint2001arXiv

A Multi-Threaded Fast Convolver for Dynamically Parallel Image Filtering

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ``pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments). This paper describes a fast (FFT based) method for convolving images, which is also well suited to very large images. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within in a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.

preprint2001arXiv

Exploiting VSIPL and OpenMP for Parallel Image Processing

VSIPL and OpenMP are two open standards for portable high performance computing. VSIPL delivers optimized single processor performance while OpenMP provides a low overhead mechanism for executing thread based parallelism on shared memory systems. Image processing is one of the main areas where VSIPL and OpenMP can make a large impact. Currently, a large fraction of image processing applications are written in the Interpreted Data Language (IDL) environment. The aim of this work is to demonstrate that the performance benefits of these new standards can be brought to image processing community in a high level manner that is transparent to users. To this end, this talk presents a fast, FFT based algorithm for performing image convolutions. This algorithm has been implemented within the IDL environment using VSIPL (for optimized single processor performance) with added OpenMP directives (for parallelism). This work demonstrates that good parallel speedups are attainable using standards and can be integrated seamlessly into existing user environments.

preprint2000arXiv

Cluster Detection in Astronomical Databases: the Adaptive Matched Filter Algorithm and Implementation

Clusters of galaxies are the most massive objects in the Universe and mapping their location is an important astronomical problem. This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters of galaxies in large astronomical databases. The Adaptive Matched Filter (AMF) algorithm presented here identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a cluster model and a background model. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.

preprint1999arXiv

An Automated Cluster Finder: the Adaptive Matched Filter

We describe an automated method for detecting clusters of galaxies in imaging and redshift galaxy surveys. The Adaptive Matched Filter (AMF) method utilizes galaxy positions, magnitudes, and---when available---photometric or spectroscopic redshifts to find clusters and determine their redshift and richness. The AMF can be applied to most types of galaxy surveys: from two-dimensional (2D) imaging surveys, to multi-band imaging surveys with photometric redshifts of any accuracy (2.5D), to three-dimensional (3D) redshift surveys. The AMF can also be utilized in the selection of clusters in cosmological N-body simulations. The AMF identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a model of the cluster and field galaxy distributions. In tests on simulated 2D and 2.5D data with a magnitude limit of r' ~ 23.5, clusters are detected with an accuracy of Delta z ~ 0.02 in redshift and ~10% in richness to z < 0.5. Detecting clusters at higher redshifts is possible with deeper surveys. In this paper we present the theory behind the AMF and describe test results on synthetic galaxy catalogs.

preprint1999arXiv

Inside-out Galaxy Formation

Current theories of galaxy formation have tended to focus on hierarchical structure formation, which is the most likely scenario for cosmological models with lots of power at small scales (e.g. standard cold dark matter). Models with little small scale power lead to scenarios closer to spherical collapse. Recently favored power spectra (e.g. CDM+Lambda) lie somewhere in between suggesting that both types of processes are important and may vary over time due to gaseous reheating. From this viewpoint this paper explores a very simple inside out scenario for galaxy formation. This scenario is a natural result of synthesizing earlier work on DM halos, spherical collapse, and gas redistribution via angular momentum. Although, this model is highly simplified and is not designed to accurately describe the detailed formation of any individual galaxy, it does (by design) predict the overall features of galaxies. In addition, old bulges and young disks are an almost unavoidable result of these very simple models. This scenario may provide a useful framework for both observers and theoreticians to think about galaxy formation.

preprint1999arXiv

Interfacing Interpreted and Compiled Languages to Support Applications on a Massively Parallel Network of Workstations (MP-NOW)

Astronomers are increasingly using Massively Parallel Network of Workstations (MP-NOW) to address their most challenging computing problems. Fully exploiting these systems is made more difficult as more and more modeling and data analysis software is written in interpreted languages (such as IDL, MATLAB, and Mathematica) which do not lend themselves to parallel computing. We present a specific example of a very simple, but generic solution to this problem. Our example uses an interpreted language (IDL) to set up a calculation and then interfaces with a computational kernel written in a compiled language (C). The IDL code then calls the C code as an external library. We have added to the computational kernel an additional layer, which manages multiple copies of the kernel running on a MP-NOW and returns the results back to the interpreted layer. Our implementation uses The Next generation Taskbag (TNT) library developed at Sarnoff to provide an efficient means for implementing task parallelism. A test problem (taken from Astronomy) has been implemented on the Sarnoff Cyclone computer which consists of 160 heterogeneous nodes connected by a ``fat'' tree 100 Mb/s switched Ethernet running the RedHat Linux and FreeBSD operating systems. Our first results in this ongoing project have demonstrated the feasibility of this approach and produced speedups of greater than 50 on 60 processors.

Jeremy Kepner

What is connected

Connect this record

See the researcher in context

Building this map preview

54 published item(s)

Improving the Graph Challenge Reference Implementation

pPython Performance Study

Developing a Series of AI Challenges for the United States Department of the Air Force

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Temporal Correlation of Internet Observatories and Outposts

The MIT Supercloud Workload Classification Challenge

Zero Botnets: An Observe-Pursue-Counter Approach

3D Real-Time Supercomputer Monitoring

75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

Accuracy and Performance Comparison of Video Action Recognition Approaches

AI Data Wrangling with Associative Arrays

DBOS: A Proposal for a Data-Centric Operating System

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

Survey of Machine Learning Accelerators

Technical Report: Developing a Working Data Hub

Associative Array Model of SQL, NoSQL, and NewSQL Databases

Benchmarking SciDB Data Import on HPC Systems

Benchmarking the Graphulo Processing Framework

Enhancing HPC Security with a User-Based Firewall

From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database

Julia Implementation of the Dynamic Distributed Dimensional Data Model

LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

Novel Graph Processor Architecture, Prototype System, and Results

PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms

Scalability of VM Provisioning Systems

The BigDAWG Architecture

The BigDAWG Polystore System and Architecture

Algebraic Conditions for Generating Accurate Adjacency Arrays

Associative Arrays: Unified Mathematics for Spreadsheets, Databases, Matrices, and Graphs

Big Data Strategies for Data Center Infrastructure Management Using a 3D Gaming Platform

Computing on Masked Data to improve the Security of Big Data

Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

Graphulo: Linear Algebra Graph Kernels for NoSQL Databases

Improving Big Data Visual Analytics with Interactive Virtual Reality

Large Enforced Sparse Non-Negative Matrix Factorization

Lustre, Hadoop, Accumulo

Parallel Vectorized Algebraic AES in MATLAB for Rapid Prototyping of Encrypted Sensor Processing Algorithms and Database Analytics

Using a Big Data Database to Identify Pathogens in Protein Data Space

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Big Data Dimensional Analysis

Computing on Masked Data: a High Performance Method for Improving Big Data Veracity

D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database

Genetic Sequence Matching Using D4M Big Data Approaches

Percolation Model of Insider Threats to Assess the Optimum Number of Rules

Standards for Graph Algorithm Primitives

pMatlab Parallel Matlab Library

MatlabMPI

A Multi-Threaded Fast Convolver for Dynamically Parallel Image Filtering

Exploiting VSIPL and OpenMP for Parallel Image Processing

Cluster Detection in Astronomical Databases: the Adaptive Matched Filter Algorithm and Implementation

An Automated Cluster Finder: the Adaptive Matched Filter

Inside-out Galaxy Formation

Interfacing Interpreted and Compiled Languages to Support Applications on a Massively Parallel Network of Workstations (MP-NOW)