Source author record

Felipe A. Cruz

Felipe A. Cruz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Mathematical Software physics.comp-ph

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

A GRASS GIS parallel module for radio-propagation predictions

Geographical information systems are ideal candidates for the application of parallel programming techniques, mainly because they usually handle large data sets. To help us deal with complex calculations over such data sets, we investigated the performance constraints of a classic master-worker parallel paradigm over a message-passing communication model. To this end, we present a new approach that employs an external database in order to improve the calculation/communication overlap, thus reducing the idle times for the worker processes. The presented approach is implemented as part of a parallel radio-coverage prediction tool for the GRASS environment. The prediction calculation employs digital elevation models and land-usage data in order to analyze the radio coverage of a geographical area. We provide an extended analysis of the experimental results, which are based on real data from an LTE network currently deployed in Slovenia. Based on the results of the experiments, which were performed on a computer cluster, the new approach exhibits better scalability than the traditional master-worker approach. We successfully tackled real-world data sets, while greatly reducing the processing time and saturating the hardware utilization.

preprint2011arXiv

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on gpus. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the gpu. The end result has been gpu kernels that run at over 500 Gigaflops on one nvidia Tesla C1060 card, thereby reaching close to practical peak. We can confidently say that gpu computing is not just a vogue, it is truly an irresistible trend in high-performance computing.

preprint2009arXiv

PetFMM--A dynamically load-balancing parallel fast multipole library

Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM), which offers O(N) complexity. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm, built on the framework of PETSc. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based $N$-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by $N$ vortex particles, ample verification and testing of the library was performed. Strong scaling results are presented with close to a million particles in up to 64 processors, including both speedup and parallel efficiency. The library is currently able to achieve over 85% parallel efficiency for 64 processors. The software library is open source under the PETSc license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications.

preprint2008arXiv

Characterization of the errors of the FMM in particle simulations

The Fast Multipole Method (FMM) offers an acceleration for pairwise interaction calculation, known as $N$-body problems, from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ with $N$ particles. This has brought dramatic increase in the capability of particle simulations in many application areas, such as electrostatics, particle formulations of fluid mechanics, and others. Although the literature on the subject provides theoretical error bounds for the FMM approximation, there are not many reports of the measured errors in a suite of computational experiments. We have performed such an experimental investigation, and summarized the results of about 1000 calculations using the FMM algorithm, to characterize the accuracy of the method in relation with the different parameters available to the user. In addition to the more standard diagnostic of the maximum error, we supply illustrations of the spatial distribution of the errors, which offers visual evidence of all the contributing factors to the overall approximation accuracy: multipole expansion, local expansion, hierarchical spatial decomposition (interaction lists, local domain, far domain). This presentation is a contribution to any researcher wishing to incorporate the FMM acceleration to their application code, as it aids in understanding where accuracy is gained or compromised.