Paper detail

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed data processing frameworks hosted in private and public clouds. However, there is a paucity of research on evaluating the performance of these frameworks hosted in a hybrid cloud, which is an emerging cloud model that integrates private and public clouds to use the best of both worlds. Therefore, in this paper, we evaluate the performance of Hadoop, Spark, and Flink in a hybrid cloud in terms of execution time, resource utilization, horizontal scalability, vertical scalability, and cost. For this study, our hybrid cloud consists of OpenStack (private cloud) and MS Azure (public cloud). We use both batch and iterative workloads for the evaluation. Our results show that in a hybrid cloud (i) the execution time increases as more nodes are borrowed by the private cloud from the public cloud, (ii) Flink outperforms Spark, which in turn outperforms Hadoop in terms of execution time, (iii) Hadoop transfers the largest amount of data among the nodes during the workload execution while Spark transfers the least amount of data, (iv) all three frameworks horizontally scale better as compared to vertical scaling, and (v) Spark is found to be least expensive in terms of $ cost for data processing while Hadoop is found the most expensive.

preprint2022arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.