Source author record

Liya Fan

Liya Fan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Data Structures and Algorithms

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Improving the Load Balance of MapReduce Operations based on the Key Distribution of Pairs

Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations. Each operation represents one invocation of the Map or Reduce function. Scheduling MapReduce operations is difficult due to highly screwed operation loads, no support to collect workload statistics, and high complexity of the scheduling problem. So current implementations adopt simple strategies, leading to poor load balance. To address these difficulties, we design an algorithm to schedule operations based on the key distribution of intermediate pairs. The algorithm involves a sub-program for selecting operations for task slots, and we name it the Balanced Subset Sum (BSS) problem. We discuss properties of BSS and design exact and approximation algorithms for it. To transparently incorporate these algorithms into MapReduce, we design a communication mechanism to collect statistics, and a pipeline within Reduce tasks to increase resource utilization. To the best of our knowledge, this is the first work on scheduling MapReduce workload at this fine-grained level. Experiments on PUMA [T+12] benchmarks show consistent performance improvement. The job duration can be reduced by up to 37%, compared with standard MapReduce.

preprint2014arXiv

OS4M: Achieving Global Load Balance of MapReduce Workload by Scheduling at the Operation Level

The efficiency of MapReduce is closely related to its load balance. Existing works on MapReduce load balance focus on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations, with each operation representing one invocation of the Map or Reduce function. By default, MapReduce adopts the hash-based method to schedule Reduce operations, which often leads to poor load balance. In addition, the copy phase of Reduce tasks overlaps with Map tasks, which significantly hinders the progress of Map tasks due to I/O contention. Moreover, the three phases of Reduce tasks run in sequence, while consuming different resources, thereby under-utilizing resources. To overcome these problems, we introduce a set of mechanisms named OS4M (Operation Scheduling for MapReduce) to improve MapReduce's performance. OS4M achieves load balance by collecting statistics of all Map operations, and calculates a globally optimal schedule to distribute Reduce operations. With OS4M, the copy phase of Reduce tasks no longer overlaps with Map tasks, and the three phases of Reduce tasks are pipelined based on their operation loads. OS4M has been transparently incorporated into MapReduce. Evaluations on standard benchmarks show that OS4M's job duration can be shortened by up to 42%, compared with a baseline of Hadoop.

preprint2013arXiv

Energy-Efficient Scheduling with Time and Processors Eligibility Restrictions

While previous work on energy-efficient algorithms focused on assumption that tasks can be assigned to any processor, we initially study the problem of task scheduling on restricted parallel processors. The objective is to minimize the overall energy consumption while speed scaling (SS) method is used to reduce energy consumption under the execution time constraint (Makespan $C_{max}$). In this work, we discuss the speed setting in the continuous model that processors can run at arbitrary speed in $[s_{min},s_{max}]$. The energy-efficient scheduling problem, involving task assignment and speed scaling, is inherently complicated as it is proved to be NP-Complete. We formulate the problem as an Integer Programming (IP) problem. Specifically, we devise a polynomial time optimal scheduling algorithm for the case tasks have a uniform size. Our algorithm runs in $O(mn^3logn)$ time, where $m$ is the number of processors and $n$ is the number of tasks. We then present a polynomial time algorithm that achieves an approximation factor of $2^{α-1}(2-\frac{1}{m^α})$ ($α$ is the power parameter) when the tasks have arbitrary size work. Experimental results demonstrate that our algorithm could provide an efficient scheduling for the problem of task scheduling on restricted parallel processors.