Source author record

George Amvrosiadis

George Amvrosiadis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning Networking and Internet Architecture Performance

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism, asynchrony, and variability in fine-grained profiling information. Our analysis of over two million ML jobs in Google datacenters reveals that a significant fraction of model training jobs could benefit from faster input data pipelines. At the same time, our analysis indicates that most jobs do not saturate host hardware, pointing in the direction of software-based bottlenecks. Motivated by these findings, we propose Plumber, a tool for finding bottlenecks in ML input pipelines. Plumber uses an extensible and interpretable operational analysis analytical model to automatically tune parallelism, prefetching, and caching under host resource constraints. Across five representative ML pipelines, Plumber obtains speedups of up to 47x for misconfigured pipelines. By automating caching, Plumber obtains end-to-end speedups of over 50% compared to state-of-the-art tuners.

preprint2020arXiv

Unleashing In-network Computing on Scientific Workloads

Many recent efforts have shown that in-network computing can benefit various datacenter applications. In this paper, we explore a relatively less-explored domain which we argue can benefit from in-network computing: scientific workloads in high-performance computing. By analyzing canonical examples of HPC applications, we observe unique opportunities and challenges for exploiting in-network computing to accelerate scientific workloads. In particular, we find that the dynamic and demanding nature of scientific workloads is the major obstacle to the adoption of in-network approaches which are mostly open-loop and lack runtime feedback. In this paper, we present NSinC (Network-accelerated ScIeNtific Computing), an architecture for fully unleashing the potential benefits of in-network computing for scientific workloads by providing closed-loop runtime feedback to in-network acceleration services. We outline key challenges in realizing this vision and a preliminary design to enable acceleration for scientific applications.