Source author record

Nikzad Babaii Rizvandi

Nikzad Babaii Rizvandi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Performance Machine Learning Artificial Intelligence Networking and Internet Architecture

Catalog footprint

What is connected

12works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Performance Provisioning and Energy Efficiency in Cloud and Distributed Computing Systems

In recent years, the issue of energy consumption in high performance computing (HPC) systems has attracted a great deal of attention. In response to this, many energy-aware algorithms have been developed in different layers of HPC systems, including the hardware layer, service layer and system layer. These algorithms are of two types: first, algorithms which directly try to improve the energy by tweaking frequency operation or scheduling algorithms; and second, algorithms which focus on improving the performance of the system, with the assumption that efficient running of a system may indirectly save more energy. In this thesis, we develop algorithms in both layers. First, we introduce three algorithms to directly improve the energy of scheduled tasks at the hardware level by using Dynamic Voltage Frequency Scaling (DVFS). Second, we propose two algorithms for modelling and resource provisioning of MapReduce applications (a well-known parametric distributed framework currently used by Google, Yahoo, Facebook and LinkedIn) based on its configuration parameters. Certainly, estimating the performance (e.g., execution time or CPU clock ticks) of a MapReduce application can be later used for smart scheduling of such applications in clouds or clusters. To evaluate the algorithms, we have conducted extensive simulation and real experiments on a 5-node physical cluster with up to 25 virtual nodes, using both synthetic and real world applications. Also, the proposed new algorithms are compared with existing algorithms by experimentation, and the experimental results reveal new information on the performance of these algorithms, as well as on the properties of MapReduce and DVFS. In the end, three open problems are revealed by the experimental observations, and their importance is explained.

preprint2013arXiv

A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications

In this paper, we study CPU utilization time patterns of several Map-Reduce applications. After extracting running patterns of several applications, the patterns with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute unknown applications in future. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different patterns lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a statistical analysis is then applied to DTWs' outcomes to select the most suitable candidates. Moreover, under a hypothesis, another algorithm is proposed to classify applications under similar CPU utilization patterns. Three widely used text processing applications (WordCount, Distributed Grep, and Terasort) and another application (Exim Mainlog parsing) are used to evaluate our hypothesis in tweaking system parameters in executing similar applications. Results were very promising and showed effectiveness of our approach on 5-node Map-Reduce platform

preprint2013arXiv

Data-Intensive Workload Consolidation on Hadoop Distributed File System

Workload consolidation, sharing physical resources among multiple workloads, is a promising technique to save cost and energy in cluster computing systems. This paper highlights a few challenges of workload consolidation for Hadoop as one of the current state-of-the-art data-intensive cluster computing system. Through a systematic step-by-step procedure, we investigate challenges for efficient server consolidation in Hadoop environments. To this end, we first investigate the inter-relationship between last level cache (LLC) contention and throughput degradation for consolidated workloads on a single physical server employing Hadoop distributed file system (HDFS). We then investigate the general case of consolidation on multiple physical servers so that their throughput never falls below a desired/predefined utilization level. We use our empirical results to model consolidation as a classic two-dimensional bin packing problem and then design a computationally efficient greedy algorithm to achieve minimum throughput degradation on multiple servers. Results are very promising and show that our greedy approach is able to achieve near optimal solution in all experimented cases.

preprint2013arXiv

Pattern Matching for Self- Tuning of MapReduce Jobs

In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, they are saved in a reference database to be later used to tweak system parameters to efficiently execute unknown applications in future. To achieve this goal, CPU utilization patterns of new applications are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different patterns lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a correlation analysis is then applied to DTWs outcomes to produce feasible similarity patterns. Three real applications (WordCount, Exim Mainlog parsing and Terasort) are used to evaluate our hypothesis in tweaking system parameters in executing similar applications. Results were very promising and showed effectiveness of our approach on pseudo-distributed MapReduce platforms.

preprint2013arXiv

Statistical Regression to Predict Total Cumulative CPU Usage of MapReduce Jobs

Recently, businesses have started using MapReduce as a popular computation framework for processing large amount of data, such as spam detection, and different data mining tasks, in both public and private clouds. Two of the challenging questions in such environments are (1) choosing suitable values for MapReduce configuration parameters e.g., number of mappers, number of reducers, and DFS block size, and (2) predicting the amount of resources that a user should lease from the service provider. Currently, the tasks of both choosing configuration parameters and estimating required resources are solely the users responsibilities. In this paper, we present an approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment. For a MapReduce job, a profile of total CPU usage in clock cycles is built from the job past executions with different values of two configuration parameters e.g., number of mappers, and number of reducers. Then, a polynomial regression is used to model the relation between these configuration parameters and total CPU usage in clock cycles of the job. We also briefly study the influence of input data scaling on measured total CPU usage in clock cycles. This derived model along with the scaling result can then be used to provision the total CPU usage in clock cycles of the same jobs with different input data size. We validate the accuracy of our models using three realistic applications (WordCount, Exim MainLog parsing, and TeraSort). Results show that the predicted total CPU usage in clock cycles of generated resource provisioning options are less than 8% of the measured total CPU usage in clock cycles in our 20-node virtual Hadoop cluster.

preprint2012arXiv

A Primarily Survey on Energy Efficiency in Cloud and Distributed Computing Systems

A survey of available techniques in hardware to reduce energy consumption

preprint2012arXiv

Mobile P2P Trusted On-Demand Video Streaming

We propose to demonstrate a mobile server assisted P2P system for on-demand video streaming. Our proposed solution uses a combination of 3G and ad-hoc Wi-Fi connections, to enable mobile devices to download content from a centralised server in a way that minimises the 3G bandwidth use and cost. On the customised GUI, we show the corresponding reduction in 3G bandwidth achieved by increasing the number of participating mobile devices in the combined P2P and ad-hoc Wi- Fi network, while demonstrating the good video playout quality on each of the mobiles. We also demonstrate the implemented trust mechanism which enables mobiles to only use trusted adhoc connections. The system has been implemented on Android based smartphones.

preprint2012arXiv

Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption

In this chapter we focus on slack reclamation and propose a new slack reclamation technique, Multiple Frequency Selection DVFS (MFS-DVFS). The key idea is to execute each task with a linear combination of more than one frequency such that this combination results in using the lowest energy by covering the whole slack time of the task. We have tested our algorithm with both random and real-world application task graphs and compared with the results in previous researches in [9] and [12-13]. The experimental results show that our approach can achieve energy almost identical to the optimum energy saving.

preprint2012arXiv

Network Load Analysis and Provisioning of MapReduce Applications

In this paper, we study the dependency between configuration parameters and network load of fixed-size MapReduce applications in shuffle phase and then propose an analytical method to model this dependency. Our approach consists of three key phases: profiling, modeling, and prediction. In the first stage, an application is run several times with different sets of MapReduce configuration parameters (here number of mappers and number of reducers) to profile the network load of the application in the shuffle phase on a given cluster. Then, the relation between these parameters and the network load is modeled by multivariate linear regression. For evaluation, three applications (WordCount, Exim Mainlog parsing, and TeraSort) are utilized to evaluate our technique on a 4-node MapReduce private cluster.

preprint2012arXiv

On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time

In this paper, we propose an analytical method to model the dependency between configuration parameters and total execution time of Map-Reduce applications. Our approach has three key phases: profiling, modeling, and prediction. In profiling, an application is run several times with different sets of MapReduce configuration parameters to profile the execution time of the application on a given platform. Then in modeling, the relation between these parameters and total execution time is modeled by multivariate linear regression. Among the possible configuration parameters, two main parameters have been used in this study: the number of Mappers, and the number of Reducers. For evaluation, two standard applications (WordCount, and Exim Mainlog parsing) are utilized to evaluate our technique on a 4-node MapReduce platform.

preprint2012arXiv

On Modelling and Prediction of Total CPU Usage for Applications in MapReduce Environments

Recently, businesses have started using MapReduce as a popular computation framework for processing large amount of data, such as spam detection, and different data mining tasks, in both public and private clouds. Two of the challenging questions in such environments are (1) choosing suitable values for MapReduce configuration parameters -e.g., number of mappers, number of reducers, and DFS block size-, and (2) predicting the amount of resources that a user should lease from the service provider. Currently, the tasks of both choosing configuration parameters and estimating required resources are solely the users' responsibilities. In this paper, we present an approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment. For a MapReduce job, a profile of total CPU usage in clock cycles is built from the job past executions with different values of two configuration parameters e.g., number of mappers, and number of reducers. Then, a polynomial regression is used to model the relation between these configuration parameters and total CPU usage in clock cycles of the job. We also briefly study the influence of input data scaling on measured total CPU usage in clock cycles. This derived model along with the scaling result can then be used to provision the total CPU usage in clock cycles of the same jobs with different input data size. We validate the accuracy of our models using three realistic applications (WordCount, Exim MainLog parsing, and TeraSort). Results show that the predicted total CPU usage in clock cycles of generated resource provisioning options are less than 8% of the measured total CPU usage in clock cycles in our 20-node virtual Hadoop cluster.

preprint2012arXiv

Some Observations on Optimal Frequency Selection in DVFS-based Energy Consumption Minimization

In recent years, the issue of energy consumption in parallel and distributed computing systems has attracted a great deal of attention. In response to this, many energy-aware scheduling algorithms have been developed primarily using the dynamic voltage-frequency scaling (DVFS) capability which has been incorporated into recent commodity processors. Majority of these algorithms involve two passes: schedule generation and slack reclamation. The former pass involves the redistribution of tasks among DVFS-enabled processors based on a given cost function that includes makespan and energy consumption; and, while the latter pass is typically achieved by executing individual tasks with slacks at a lower processor frequency. In this paper, a new slack reclamation algorithm is proposed by approaching the energy reduction problem from a different angle. Firstly, the problem of task slack reclamation by using combinations of processors' frequencies is formulated. Secondly, several proofs are provided to show that (1) if the working frequency set of processor is assumed to be continues, the optimal energy will be always achieved by using only one frequency, (2) for real processors with a discrete set of working frequencies, the optimal energy is always achieved by using at most two frequencies, and (3) these two frequencies are adjacent/neighbouring when processor energy consumption is a convex function of frequency. Thirdly, a novel algorithm to find the best combination of frequencies to result the optimal energy is presented. The presented algorithm has been evaluated based on results obtained from experiments with three different sets of task graphs: 3000 randomly generated task graphs, and 600 task graphs for two popular applications (Gauss-Jordan and LU decomposition). The results show the superiority of the proposed algorithm in comparison with other techniques.

Nikzad Babaii Rizvandi

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Performance Provisioning and Energy Efficiency in Cloud and Distributed Computing Systems

A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications

Data-Intensive Workload Consolidation on Hadoop Distributed File System

Pattern Matching for Self- Tuning of MapReduce Jobs

Statistical Regression to Predict Total Cumulative CPU Usage of MapReduce Jobs

A Primarily Survey on Energy Efficiency in Cloud and Distributed Computing Systems

Mobile P2P Trusted On-Demand Video Streaming

Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption

Network Load Analysis and Provisioning of MapReduce Applications

On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time

On Modelling and Prediction of Total CPU Usage for Applications in MapReduce Environments

Some Observations on Optimal Frequency Selection in DVFS-based Energy Consumption Minimization