Paper detail

Global Optimization of Data Pipelines in Heterogeneous Cloud Environments

Modern production data processing and machine learning pipelines on the cloud are critical components for many cloud-based companies. These pipelines are typically composed of complex workflows represented by directed acyclic graphs (DAGs). Cloud environments are attractive to these workflows due to the wide range of choice with heterogeneous instances and prices that can provide the flexibility for different cost-performance needs. However, this flexibility also leads to the complexity of selecting the right resource configuration (e.g., instance type, resource demands) for each task in the DAG, while simultaneously scheduling the tasks with the selected resources to reach the optimal end-to-end performance and cost. These two decisions are often codependent resulting in an NP-hard scheduling optimization bottleneck. Existing solutions only focus solely on either problem and ignore the co-effect on the end-to-end optimum. We propose AGORA, a scheduler that considers both task-level resource allocation and execution for DAG workflows as a whole in heterogeneous cloud environments. AGORA first (1) studies the characteristics of the tasks from prior runs and gives predictions on resource configurations, and (2) automatically finds the best configuration with its corresponding schedules for the entire workflow with a cost-performance objective. We evaluate AGORA in a heterogeneous Amazon Web Services (AWS) cloud environment with multi-tenant workflows served by Airflow and demonstrate a performance improvement up to 45% and cost reduction up to 77% compared to state-of-the-art schedulers. In addition, we apply AGORA to a real-world production trace from Alibaba and show cost reduction of 65% and DAG completion time reduction of 57%.

preprint2022arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.