Paper detail

High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms

We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics processing units (GPUs). Our work targets scientific analysis applications in which datasets are processed in application-specific data chunks, and the processing of a data chunk is expressed as a hierarchical pipeline of operations. The proposed middleware system combines a bag-of-tasks style execution with coarse-grain dataflow execution. Data chunks and associated data processing pipelines are scheduled across cluster nodes using a demand driven approach, while within a node operations in a given pipeline instance are scheduled across CPUs and GPUs. The runtime system implements several optimizations, including performance aware task scheduling, architecture aware process placement, data locality conscious task assignment, and data prefetching and asynchronous data copy, to maximize utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. The application and performance benefits of the runtime middleware are demonstrated using an image analysis application, which is employed in a brain cancer study, on a state-of-the-art hybrid cluster in which each node has two 6-core CPUs and three GPUs. Our results show that implementing and scheduling application data processing as a set of fine-grain operations provide more opportunities for runtime optimizations and attain better performance than a coarser-grain, monolithic implementation. The proposed runtime system can achieve high-throughput processing of large datasets - we were able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles at about 150 tiles/second rate on 100 nodes.

preprint2012arXivOpen access

Signal facts

What is known right now

Open access6 authors2 topics

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.