Paper detail

Optimization and analysis of large scale data sorting algorithm based on Hadoop

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of big data sorting is to use shuffle and sort phase in MapReduce based on Hadoop. However, if we use it directly, the efficiency could be very low and the load imbalance can be a big problem. In this paper we carry out an experimental study of an optimization and analysis of large scale data sorting algorithm based on hadoop. In order to reach optimization, we use more than 2 rounds MapReduce. In the first round, we use a MapReduce to take sample randomly. Then we use another MapReduce to order the data uniformly, according to the results of the first round. If the data is also too big, it will turn back to the first round and keep on. The experiments show that, it is better to use the optimized algorithm than shuffle of MapReduce to sort large scale data.

preprint2015arXivOpen access

Signal facts

What is known right now

Open access4 authors1 topic

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.