Researcher profile

Shagun Dhingra

Shagun Dhingra contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 11 - UnverifiedVerification L1Unclaimed author
1works
0followers
1topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2022arXiv

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed data processing frameworks hosted in private and public clouds. However, there is a paucity of research on evaluating the performance of these frameworks hosted in a hybrid cloud, which is an emerging cloud model that integrates private and public clouds to use the best of both worlds. Therefore, in this paper, we evaluate the performance of Hadoop, Spark, and Flink in a hybrid cloud in terms of execution time, resource utilization, horizontal scalability, vertical scalability, and cost. For this study, our hybrid cloud consists of OpenStack (private cloud) and MS Azure (public cloud). We use both batch and iterative workloads for the evaluation. Our results show that in a hybrid cloud (i) the execution time increases as more nodes are borrowed by the private cloud from the public cloud, (ii) Flink outperforms Spark, which in turn outperforms Hadoop in terms of execution time, (iii) Hadoop transfers the largest amount of data among the nodes during the workload execution while Spark transfers the least amount of data, (iv) all three frameworks horizontally scale better as compared to vertical scaling, and (v) Spark is found to be least expensive in terms of $ cost for data processing while Hadoop is found the most expensive.