Source author record

David Castells-Rufas

David Castells-Rufas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Computer Vision Performance Programming Languages eess.IV

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MetaFormer-driven Encoding Network for Robust Medical Semantic Segmentation

Semantic segmentation is crucial for medical image analysis, enabling precise disease diagnosis and treatment planning. However, many advanced models employ complex architectures, limiting their use in resource-constrained clinical settings. This paper proposes MFEnNet, an efficient medical image segmentation framework that incorporates MetaFormer in the encoding phase of the U-Net backbone. MetaFormer, an architectural abstraction of vision transformers, provides a versatile alternative to convolutional neural networks by transforming tokenized image patches into sequences for global context modeling. To mitigate the substantial computational cost associated with self-attention, the proposed framework replaces conventional transformer modules with pooling transformer blocks, thereby achieving effective global feature aggregation at reduced complexity. In addition, Swish activation is used to achieve smoother gradients and faster convergence, while spatial pyramid pooling is incorporated at the bottleneck to improve multi-scale feature extraction. Comprehensive experiments on different medical segmentation benchmarks demonstrate that the proposed MFEnNet approach attains competitive accuracy while significantly lowering computational cost compared to state-of-the-art models. The source code for this work is available at https://github.com/tranleanh/mfennet.

preprint2022arXiv

BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation

Vision-based bronchoscopy (VB) models require the registration of the virtual lung model with the frames from the video bronchoscopy to provide effective guidance during the biopsy. The registration can be achieved by either tracking the position and orientation of the bronchoscopy camera or by calibrating its deviation from the pose (position and orientation) simulated in the virtual lung model. Recent advances in neural networks and temporal image processing have provided new opportunities for guided bronchoscopy. However, such progress has been hindered by the lack of comparative experimental conditions. In the present paper, we share a novel synthetic dataset allowing for a fair comparison of methods. Moreover, this paper investigates several neural network architectures for the learning of temporal information at different levels of subject personalization. In order to improve orientation measurement, we also present a standardized comparison framework and a novel metric for camera orientation learning. Results on the dataset show that the proposed metric and architectures, as well as the standardized conditions, provide notable improvements to current state-of-the-art camera pose estimation in video bronchoscopy.

preprint2016arXiv

Energy Efficiency of Many-Soft-Core Processors

The growing capacity of integration allows to instantiate hundreds of soft-core processors in a single FPGA to create a reconfigurable multiprocessing system. Lately, FPGAs have been proven to give a higher energy efficiency than alternative platforms like CPUs and GPGPUs for certain workloads and are increasingly used in data-centers. In this paper we investigate whether many-soft-core processors can achieve similar levels of energy efficiency while providing a general purpose environment, more easily programmed, and allowing to run other applications without reconfiguring the device. With a simple application example we are able to create a reconfigurable multiprocessing system achieving an energy efficiency 58 times higher than a recent ultra-low-power processor and 124 times higher than a recent high performance GPGPU.

preprint2016arXiv

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2016

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2016. Prague, January 18th. Collocated with HIPEAC 2016 Conference.

preprint2015arXiv

OMP2HMPP: Compiler Framework for Energy Performance Trade-off Analysis of Automatically Generated Codes

We present OMP2HMPP, a tool that, in a first step, automatically translates OpenMP code into various possible transformations of HMPP. In a second step OMP2HMPP executes all variants to obtain the performance and power consumption of each transformation. The resulting trade-off can be used to choose the more convenient version. After running the tool on a set of codes from the Polybench benchmark we show that the best automatic transformation is equivalent to a manual one done by an expert. Compared with original OpenMP code running in 2 quad-core processors we obtain an average speed-up of 31x and 5.86x factor in operations per watt.

preprint2015arXiv

OMP2MPI: Automatic MPI code generation from OpenMP programs

In this paper, we present OMP2MPI a tool that generates automatically MPI source code from OpenMP. With this transformation the original program can be adapted to be able to exploit a larger number of processors by surpassing the limits of the node level on large HPC clusters. The transformation can also be useful to adapt the source code to execute in distributed memory many-cores with message passing support. In addition, the resulting MPI code can be used as an starting point that still can be further optimized by software engineers. The transformation process is focused on detecting OpenMP parallel loops and distributing them in a master/worker pattern. A set of micro-benchmarks have been used to verify the correctness of the the transformation and to measure the resulting performance. Surprisingly not only the automatically generated code is correct by construction, but also it often performs faster even when executed with MPI.

preprint2015arXiv

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2015

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2015. Amsterdam, January 21st. Collocated with HIPEAC 2015 Conference.

preprint2014arXiv

Fast Trace Generation of Many-Core Embedded Systems with Native Simulation

Embedded Software development and optimization are complex tasks. Late availably of hardware platforms, their usual low visibility and controllability, and their limiting resource constraints makes early performance estimation an attractive option instead of using the final execution platform. With early performance estimation, software development can progress although the real hardware is not yet available or it is too complex to interact with. In this paper, we present how the native simulation framework SCoPE is extended to generate OTF trace files. Those trace files can be later visualized with trace visualization tools, which recently were only used to optimize HPC workloads in order to iterate in the development process.

preprint2014arXiv

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

High-performance computing are based more and more in heterogeneous architectures and GPGPUs have become one of the main integrated blocks in these, as the recently emerged Mali GPU in embedded systems or the NVIDIA GPUs in HPC servers. In both GPGPUs, programming could become a hurdle that can limit their adoption, since the programmer has to learn the hardware capabilities and the language to work with these. We present OMP2HMPP, a tool that, automatically trans-lates a high-level C source code(OpenMP) code into HMPP. The generated version rarely will differs from a hand-coded HMPP version, and will provide an important speedup, near 113%, that could be later improved by hand-coded CUDA. The generated code could be transported either to HPC servers and to embedded GPUs, due to the commonalities between them.

David Castells-Rufas

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

MetaFormer-driven Encoding Network for Robust Medical Semantic Segmentation

BronchoPose: an analysis of data and model configuration for vision-based bronchoscopy pose estimation

Energy Efficiency of Many-Soft-Core Processors

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2016

OMP2HMPP: Compiler Framework for Energy Performance Trade-off Analysis of Automatically Generated Codes

OMP2MPI: Automatic MPI code generation from OpenMP programs

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2015

Fast Trace Generation of Many-Core Embedded Systems with Native Simulation

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions