Researcher profile

Irene Finocchi

Irene Finocchi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - Baseline
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2015arXiv

On data skewness, stragglers, and MapReduce progress indicators

We tackle the problem of predicting the performance of MapReduce applications, designing accurate progress indicators that keep programmers informed on the percentage of completed computation time during the execution of a job. Through extensive experiments, we show that state-of-the-art progress indicators (including the one provided by Hadoop) can be seriously harmed by data skewness, load unbalancing, and straggling tasks. This is mainly due to their implicit assumption that the running time depends linearly on the input size. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption and exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Our theoretical progress model requires fine-grained profile data, that can be very difficult to manage in practice. To overcome this issue, we resort to computing accurate approximations for some of the quantities used in our model through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of real-world benchmarks shows that NearestFit is practical w.r.t. space and time overheads and that its accuracy is generally very good, even in scenarios where competitors incur non-negligible errors and wide prediction fluctuations. Overall, NearestFit significantly improves the current state-of-art on progress analysis for MapReduce.

preprint2014arXiv

Experimental Evaluation of Algorithms for the Food-Selection Problem

In this paper, we describe the result of our experiments on Algorithms for the Food-Selection Problem, which is the fundamental problem first stated and addressed in the seminal paper \cite{pigout}. Because the key aspect of any experimental evaluation is the \textbf{reproducibility}, we detail deeply the setup of all our experiments, thus leaving to the interested eater the opportunity to reproduce all the results described in this paper. More specifically, we describe all the answers we provided to the questions proposed in \cite{pigout}: Where can I have dinner tonight? What is the typical Roman cuisine that I should (not) miss? Where can I find the best coffee or gelato in town?

preprint2013arXiv

Ball-Larus Path Profiling Across Multiple Loop iterations

Identifying the hottest paths in the control flow graph of a routine can direct optimizations to portions of the code where most resources are consumed. This powerful methodology, called path profiling, was introduced by Ball and Larus in the mid 90s and has received considerable attention in the last 15 years for its practical relevance. A shortcoming of Ball-Larus path profiling was the inability to profile cyclic paths, making it difficult to mine interesting execution patterns that span multiple loop iterations. Previous results, based on rather complex algorithms, have attempted to circumvent this limitation at the price of significant performance losses already for a small number of iterations. In this paper, we present a new approach to multiple iterations path profiling, based on data structures built on top of the original Ball-Larus numbering technique. Our approach allows it to profile all executed paths obtained as a concatenation of up to k Ball-Larus acyclic paths, where k is a user-defined parameter. An extensive experimental investigation on a large variety of Java benchmarks on the Jikes RVM shows that, surprisingly, our approach can be even faster than Ball-Larus due to fewer operations on smaller hash tables, producing compact representations of cyclic paths even for large values of k.

preprint2013arXiv

Multithreaded Input-Sensitive Profiling

Input-sensitive profiling is a recent performance analysis technique that makes it possible to estimate the empirical cost function of individual routines of a program, helping developers understand how performance scales to larger inputs and pinpoint asymptotic bottlenecks in the code. A current limitation of input-sensitive profilers is that they specifically target sequential computations, ignoring any communication between threads. In this paper we show how to overcome this limitation, extending the range of applicability of the original approach to multithreaded applications and to applications that operate on I/O streams. We develop new metrics for automatically estimating the size of the input given to each routine activation, addressing input produced by non-deterministic memory stores performed by other threads as well as by the OS kernel (e.g., in response to I/O or network operations). We provide real case studies, showing that our extension allows it to characterize the behavior of complex applications more precisely than previous approaches. An extensive experimental investigation on a variety of benchmark suites (including the SPEC OMP2012 and the PARSEC benchmarks) shows that our Valgrind-based input-sensitive profiler incurs an overhead comparable to other prominent heavyweight analysis tools, while collecting significantly more performance points from each profiling session and correctly characterizing both thread-induced and external input.

preprint2011arXiv

Reactive Imperative Programming with Dataflow Constraints

Dataflow languages provide natural support for specifying constraints between objects in dynamic applications, where programs need to react efficiently to changes of their environment. Researchers have long investigated how to take advantage of dataflow constraints by embedding them into procedural languages. Previous mixed imperative/dataflow systems, however, require syntactic extensions or libraries of ad hoc data types for binding the imperative program to the dataflow solver. In this paper we propose a novel approach that smoothly combines the two paradigms without placing undue burden on the programmer. In our framework, programmers can define ordinary commands of the host imperative language that enforce constraints between objects stored in "reactive" memory locations. Reactive objects can be of any legal type in the host language, including primitive data types, pointers, arrays, and structures. Constraints are automatically re-executed every time their input memory locations change, letting a program behave like a spreadsheet where the values of some variables depend upon the values of other variables. The constraint solving mechanism is handled transparently by altering the semantics of elementary operations of the host language for reading and modifying objects. We provide a formal semantics and describe a concrete embodiment of our technique into C/C++, showing how to implement it efficiently in conventional platforms using off-the-shelf compilers. We discuss relevant applications to reactive scenarios, including incremental computation, observer design pattern, and data structure repair. The performance of our implementation is compared to ad hoc problem-specific change propagation algorithms and to language-centric approaches such as self-adjusting computation and subject/observer communication mechanisms, showing that the proposed approach is efficient in practice.