Source author record

Dylan Hutchison

Dylan Hutchison appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Mathematical Software Programming Languages Distributed, Parallel, and Cluster Computing Performance Data Structures and Algorithms Machine Learning

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Associative Array Model of SQL, NoSQL, and NewSQL Databases

The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL. These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries.

preprint2016arXiv

Benchmarking the Graphulo Processing Framework

Graph algorithms have wide applicablity to a variety of domains and are often used on massive datasets. Recent standardization efforts such as the GraphBLAS specify a set of key computational kernels that hardware and software developers can adhere to. Graphulo is a processing framework that enables GraphBLAS kernels in the Apache Accumulo database. In our previous work, we have demonstrated a core Graphulo operation called \textit{TableMult} that performs large-scale multiplication operations of database tables. In this article, we present the results of scaling the Graphulo engine to larger problems and scalablity when a greater number of resources is used. Specifically, we present two experiments that demonstrate Graphulo scaling performance is linear with the number of available resources. The first experiment demonstrates cluster processing rates through Graphulo's TableMult operator on two large graphs, scaled between $2^{17}$ and $2^{19}$ vertices. The second experiment uses TableMult to extract a random set of rows from a large graph ($2^{19}$ nodes) to simulate a cued graph analytic. These benchmarking results are of relevance to Graphulo users who wish to apply Graphulo to their graph problems.

preprint2016arXiv

From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database

Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside a database versus an external system---in short, that memory requirements and relative I/O are critical factors.

preprint2016arXiv

Julia Implementation of the Dynamic Distributed Dimensional Data Model

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation.

preprint2016arXiv

Lara: A Key-Value Algebra underlying Arrays and Relations

Data processing systems roughly group into families such as relational, array, graph, and key-value. Many data processing tasks exceed the capabilities of any one family, require data stored across families, or run faster when partitioned onto multiple families. Discovering ways to execute computation among multiple available systems, let alone discovering an optimal execution plan, is challenging given semantic differences between disparate families of systems. In this paper we introduce a new algebra, Lara, which underlies and unifies algebras representing the families above in order to facilitate translation between systems. We describe the operations and objects of Lara---union, join, and ext on associative tables---and show her properties and equivalences to other algebras. Multi-system optimization has a bright future, in which we proffer Lara for the role of universal connector.

preprint2016arXiv

ModelWizard: Toward Interactive Model Construction

Data scientists engage in model construction to discover machine learning models that well explain a dataset, in terms of predictiveness, understandability and generalization across domains. Questions such as "what if we model common cause Z" and "what if Y's dependence on X reverses" inspire many candidate models to consider and compare, yet current tools emphasize constructing a final model all at once. To more naturally reflect exploration when debating numerous models, we propose an interactive model construction framework grounded in composable operations. Primitive operations capture core steps refining data and model that, when verified, form an inductive basis to prove model validity. Derived, composite operations enable advanced model families, both generic and specialized, abstracted away from low-level details. We prototype our envisioned framework in ModelWizard, a domain-specific language embedded in F# to construct Tabular models. We enumerate language design and demonstrate its use through several applications, emphasizing how language may facilitate creation of complex models. To future engineers designing data science languages and tools, we offer ModelWizard's design as a new model construction paradigm, speeding discovery of our universe's structure.

preprint2015arXiv

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we implement GraphBLAS sparse matrix multiplication server-side by leveraging Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.

preprint2015arXiv

Graphulo: Linear Algebra Graph Kernels for NoSQL Databases

Big data and the Internet of Things era continue to challenge computational systems. Several technology solutions such as NoSQL databases have been developed to deal with this challenge. In order to generate meaningful results from large datasets, analysts often use a graph representation which provides an intuitive way to work with the data. Graph vertices can represent users and events, and edges can represent the relationship between vertices. Graph algorithms are used to extract meaningful information from these very large graphs. At MIT, the Graphulo initiative is an effort to perform graph algorithms directly in NoSQL databases such as Apache Accumulo or SciDB, which have an inherently sparse data storage scheme. Sparse matrix operations have a history of efficient implementations and the Graph Basic Linear Algebra Subprogram (GraphBLAS) community has developed a set of key kernels that can be used to develop efficient linear algebra operations. However, in order to use the GraphBLAS kernels, it is important that common graph algorithms be recast using the linear algebra building blocks. In this article, we look at common classes of graph algorithms and recast them into linear algebra operations using the GraphBLAS building blocks.

Dylan Hutchison

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Associative Array Model of SQL, NoSQL, and NewSQL Databases

Benchmarking the Graphulo Processing Framework

From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database

Julia Implementation of the Dynamic Distributed Dimensional Data Model

Lara: A Key-Value Algebra underlying Arrays and Relations

ModelWizard: Toward Interactive Model Construction

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

Graphulo: Linear Algebra Graph Kernels for NoSQL Databases