Source author record

Marcelo Naiouf

Marcelo Naiouf appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

4works
4topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Migrating CUDA to oneAPI: A Smith-Waterman Case Study

To face the programming challenges related to heterogeneous computing, Intel recently introduced oneAPI, a new programming environment that allows code developed in Data Parallel C++ (DPC++) language to be run on different devices such as CPUs, GPUs, FPGAs, among others. To tackle CUDA-based legacy codes, oneAPI provides a compatibility tool (dpct) that facilitates the migration to DPC++. Due to the large amount of existing CUDA-based software in the bioinformatics context, this paper presents our experiences porting SW#db, a well-known sequence alignment tool, to DPC++ using dpct. From the experimental work, it was possible to prove the usefulness of dpct for SW#db code migration and the cross-GPU vendor, cross-architecture portability of the migrated DPC++ code. In addition, the performance results showed that the migrated DPC++ code reports similar efficiency rates to its CUDA-native counterpart or even better in some tests (approximately +5%).

preprint2020arXiv

Soft Errors Detection and Automatic Recovery based on Replication combined with different Levels of Checkpointing

Handling faults is a growing concern in HPC. In future exascale systems, it is projected that silent undetected errors will occur several times a day, increasing the occurrence of corrupted results. In this article, we propose SEDAR, which is a methodology that improves system reliability against transient faults when running parallel message-passing applications. Our approach, based on process replication for detection, combined with different levels of checkpointing for automatic recovery, has the goal of helping users of scientific applications to obtain executions with correct results. SEDAR is structured in three levels: (1) only detection and safe-stop with notification; (2) recovery based on multiple system-level checkpoints; and (3) recovery based on a single valid user-level checkpoint. As each of these variants supplies a particular coverage but involves limitations and implementation costs, SEDAR can be adapted to the needs of the system. In this work, a description of the methodology is presented and the temporal behavior of employing each SEDAR strategy is mathematically described, both in the absence and presence of faults. A model that considers all the fault scenarios on a test application is introduced to show the validity of the detection and recovery mechanisms. An overhead evaluation of each variant is performed with applications involving different communication patterns; this is also used to extract guidelines about when it is beneficial to employ each SEDAR protection level. As a result, we show its efficacy and viability to tolerate transient faults in target HPC environments.

preprint2016arXiv

Quantum Enhanced Correlation Matrix Memories via States Orthogonalisation

This paper introduces a Quantum Correlation Matrix Memory (QCMM) and Enhanced QCMM (EQCMM), which are useful to work with quantum memories. A version of classical Gram-Schmidt orthogonalisation process in Dirac notation (called Quantum Orthogonalisation Process: QOP) is presented to convert a non-orthonormal quantum basis, i.e., a set of non-orthonormal quantum vectors (called qudits) to an orthonormal quantum basis, i.e., a set of orthonormal quantum qudits. This work shows that it is possible to improve the performance of QCMM thanks QOP algorithm. Besides, the EQCMM algorithm has a lot of additional fields of applica-tions, e.g.: Steganography, as a replacement Hopfield Net-works, Bi-level image processing, etc. Finally, it is important to mention that the EQCMM is an extremely easy to implement in any firmware.

preprint2010arXiv

Automatic Mapping Tasks to Cores - Evaluating AMTHA Algorithm in Multicore Architectures

The AMTHA (Automatic Mapping Task on Heterogeneous Architectures) algorithm for task-to-processors assignment and the MPAHA (Model of Parallel Algorithms on Heterogeneous Architectures) model are presented. The use of AMTHA is analyzed for multicore processor-based architectures, considering the communication model among processes in use. The results obtained in the tests carried out are presented, comparing the real execution times on multicores of a set of synthetic applications with the predictions obtained with AMTHA. Finally current lines of research are presented, focusing on clusters of multicores and hybrid programming paradigms.