Source author record

Jacob Nelson

Jacob Nelson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Machine Learning Neural and Evolutionary Computing

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Bundled References: An Abstraction for Highly-Concurrent Linearizable Range Queries

We present bundled references, a new building block to provide linearizable range query operations for highly concurrent linked data structures. Bundled references allow range queries to traverse a path through the data structure that is consistent with the target atomic snapshot and is made of the minimal amount of nodes that should be accessed to preserve linearizability. We implement our technique into a skip list, a binary search tree, and a linked list data structure. Our evaluation reveals that in mixed workloads, our design improves upon the state-of-the-art techniques by 3.9x for a skip list and 2.1x for a binary search tree. We also integrate our bundled data structure into the DBx1000 in-memory database, yielding up to 20% gain over the same competitors.

preprint2020arXiv

Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training

Most work in the deep learning systems community has focused on faster inference, but arriving at a trained model requires lengthy experiments. Accelerating training lets developers iterate faster and come up with better models. DNN training is often seen as a compute-bound problem, best done in a single large compute node with many GPUs. As DNNs get bigger, training requires going distributed. Distributed deep neural network (DDNN) training constitutes an important workload on the cloud. Larger DNN models and faster compute engines shift the training performance bottleneck from computation to communication. Our experiments show existing DNN training frameworks do not scale in a typical cloud environment due to insufficient bandwidth and inefficient parameter server software stacks.We propose PBox, a balanced, scalable central PS hardware that balances compute and communication resources, and PHub, a high performance parameter server (PS) software design that provides an optimized network stack and a streamlined gradient processing pipeline to benefit common PS setups to utilize PBox. We show that in a typical cloud environment, PBox can achieve up to 3.8x speedup over state-of-the-art designs when training ImageNet. We discuss future directions of integrating PBox with programmable switches for in-network aggregation during training, leveraging the datacenter network topology to reduce bandwidth usage and localize data movement.

preprint2020arXiv

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter servers (PSs) with optimized network stacks and gradient processing pipelines, as well as server and network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art distributed training techniques for cloud-based ImageNet workloads, with 25% better throughput per dollar.

Jacob Nelson

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Bundled References: An Abstraction for Highly-Concurrent Linearizable Range Queries

Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training