Researcher profile

Renato L. F. Cunha

Renato L. F. Cunha contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - Baseline
4works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2016arXiv

Helping HPC Users Specify Job Memory Requirements via Machine Learning

Resource allocation in High Performance Computing (HPC) settings is still not easy for end-users due to the wide variety of application and environment configuration options. Users have difficulties to estimate the number of processors and amount of memory required by their jobs, select the queue and partition, and estimate when job output will be available to plan for next experiments. Apart from wasting infrastructure resources by making wrong allocation decisions, overall user response time can also be negatively impacted. Techniques that exploit batch scheduler systems to predict waiting time and runtime of user jobs have already been proposed. However, we observed that such techniques are not suitable for predicting job memory usage. In this paper we introduce a tool to help users predict their memory requirements using machine learning. We describe the integration of the tool with a batch scheduler system, discuss how batch scheduler log data can be exploited to generate memory usage predictions through machine learning, and present results of two production systems containing thousands of jobs.

preprint2016arXiv

Job Placement Advisor Based on Turnaround Predictions for HPC Hybrid Clouds

Several companies and research institutes are moving their CPU-intensive applications to hybrid High Performance Computing (HPC) cloud environments. Such a shift depends on the creation of software systems that help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters. Relying blindly on turnaround prediction techniques will affect negatively response times inside HPC cloud environments. This paper introduces a tool to make job placement decisions in HPC hybrid cloud environments taking into account the inaccuracy of execution and waiting time predictions. We used job traces from real supercomputing centers to run our experiments, and compared the performance between environments using real speedup curves. We also extended a state-of-the-art machine learning based predictor to work with data from the cluster scheduler. Our main findings are: (i) depending on workload characteristics, there is a turning point where predictions should be disregarded in favor of a more conservative decision to minimize job turnaround times and (ii) scheduler data plays a key role in improving predictions generated with machine learning using job trace data---our experiments showed around 20% prediction accuracy improvements.

preprint2016arXiv

SLA-aware Interactive Workflow Assistant for HPC Parameter Sweeping Experiments

A common workflow in science and engineering is to (i) setup and deploy large experiments with tasks comprising an application and multiple parameter values; (ii) generate intermediate results; (iii) analyze them; and (iv) reprioritize the tasks. These steps are repeated until the desired goal is achieved, which can be the evaluation/simulation of complex systems or model calibration. Due to time and cost constraints, sweeping all possible parameter values of the user application is not always feasible. Experimental Design techniques can help users reorganize submission-execution-analysis workflows to bring a solution in a more timely manner. This paper introduces a novel tool that leverages users' feedback on analyzing intermediate results of parameter sweeping experiments to advise them about their strategies on parameter selections tied to their SLA constraints. We evaluated our tool with three applications of distinct domains and search space shapes. Our main finding is that users with submission-execution-analysis workflows can benefit from their interaction with intermediate results and adapt themselves according to their domain expertise and SLA constraints.

preprint2013arXiv

Patience-aware Scheduling for Cloud Services: Freeing Users from the Chains of Boredom

Scheduling of service requests in Cloud computing has traditionally focused on the reduction of pre-service wait, generally termed as waiting time. Under certain conditions such as peak load, however, it is not always possible to give reasonable response times to all users. This work explores the fact that different users may have their own levels of tolerance or patience with response delays. We introduce scheduling strategies that produce better assignment plans by prioritising requests from users who expect to receive the results earlier and by postponing servicing jobs from those who are more tolerant to response delays. Our analytical results show that the behaviour of users' patience plays a key role in the evaluation of scheduling techniques, and our computational evaluation demonstrates that, under peak load, the new algorithms typically provide better user experience than the traditional FIFO strategy.