Researcher profile

Dario Garcia-Gasulla

Dario Garcia-Gasulla contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

LLM Translation of Compiler Intermediate Representation

GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate GIMPLE (as emitted by GCC) to LLVM IR (as emitted by LLVM). The model is trained on paired IRs extracted from C sources and evaluated on the GIMPLE-to-LLVM IR transformation applied to IRs derived from real-world C code and competitive programming problems. To the best of our knowledge, IRIS-14B is the first model trained explicitly for IR-to-IR translation. It outperforms the accuracy of widely used models, including the largest state-of-the-art open models available today, ranging from 13 to 1,000 billion parameters, by up to 44 percentage points. The proposed transformation supports the integration of LLMs as complementary components within hybrid neuro-symbolic compiler architectures, where models such as IRIS-14B act as interoperability layers enabling cross-toolchain workflows without modifying existing compiler passes, while traditional compiler infrastructure continues to perform deterministic compilation and optimization.

preprint2026arXiv

RuC: HDL-Agnostic Rule Completion Benchmark Generation

Large Language Models (LLMs) have rapidly improved in performance across code-related tasks, making their integration into Register Transfer Level (RTL) development increasingly attractive. Mimicking the behavior of inline code assistants, many benchmarks evaluate LLMs' capabilities in code completion, either assessing the generation of entire hardware modules or the completion of a single line within a module. However both of these approaches lack the ability to control the granularity of the code-completion sample size and the syntactic range of completions. To overcome these limitations, we present a framework for language-agnostic rule completion (RuC), a grammar-driven, rule-selectable benchmark generator that automatically produces RTL code-completion tasks from a set of input hardware description sources. RuC uses the target Hardware Description Language (HDL) grammar to mask syntactically defined code regions and prompts a model to regenerate them using the surrounding unmasked code as context, enabling a controlled and scalable evaluation of the domain-specific model's code-understanding capabilities, ranging from assignments to the reconstruction of entire logic blocks. We use RuC to generate two SystemVerilog rule-completion benchmarks from the Tiny Tapeout shuttle TT07 and the CVE2 RISC-V core to demonstrate RuC's applicability to a broad range of designs, and conduct a comparative study of the code completion capabilities of modern open-source LLMs across diverse settings. Results indicate that completion performance strongly depends on the model type, the grammatical structure of the masked region, and the prompting strategy. Specifically, the highest scores are obtained with Fill-in-the-Middle (FIM) prompting. These findings highlight the value of grammar-driven, arbitrarily granular benchmarks for meaningful evaluation of LLM capabilities in RTL development workflows.

preprint2022arXiv

Focus! Rating XAI Methods and Finding Biases

AI explainability improves the transparency of models, making them more trustworthy. Such goals are motivated by the emergence of deep learning models, which are obscure by nature; even in the domain of images, where deep learning has succeeded the most, explainability is still poorly assessed. In the field of image recognition many feature attribution methods have been proposed with the purpose of explaining a model's behavior using visual cues. However, no metrics have been established so far to assess and select these methods objectively. In this paper we propose a consistent evaluation score for feature attribution methods -- the Focus -- designed to quantify their coherency to the task. While most previous work adds out-of-distribution noise to samples, we introduce a methodology to add noise from within the distribution. This is done through mosaics of instances from different classes, and the explanations these generate. On those, we compute a visual pseudo-precision metric, Focus. First, we show the robustness of the approach through a set of randomization experiments. Then we use Focus to compare six popular explainability techniques across several CNN architectures and classification datasets. Our results find some methods to be consistently reliable (LRP, GradCAM), while others produce class-agnostic explanations (SmoothGrad, IG). Finally we introduce another application of Focus, using it for the identification and characterization of biases found in models. This empowers bias-management tools, in another small step towards trustworthy AI.

preprint2022arXiv

Healthy Twitter discussions? Time will tell

Studying misinformation and how to deal with unhealthy behaviours within online discussions has recently become an important field of research within social studies. With the rapid development of social media, and the increasing amount of available information and sources, rigorous manual analysis of such discourses has become unfeasible. Many approaches tackle the issue by studying the semantic and syntactic properties of discussions following a supervised approach, for example using natural language processing on a dataset labeled for abusive, fake or bot-generated content. Solutions based on the existence of a ground truth are limited to those domains which may have ground truth. However, within the context of misinformation, it may be difficult or even impossible to assign labels to instances. In this context, we consider the use of temporal dynamic patterns as an indicator of discussion health. Working in a domain for which ground truth was unavailable at the time (early COVID-19 pandemic discussions) we explore the characterization of discussions based on the the volume and time of contributions. First we explore the types of discussions in an unsupervised manner, and then characterize these types using the concept of ephemerality, which we formalize. In the end, we discuss the potential use of our ephemerality definition for labeling online discourses based on how desirable, healthy and constructive they are.

preprint2022arXiv

Signs for Ethical AI: A Route Towards Transparency

Today, Artificial Intelligence (AI) has a direct impact on the daily life of billions of people. Being applied to sectors like finance, health, security and advertisement, AI fuels some of the biggest companies and research institutions in the world. Its impact in the near future seems difficult to predict or bound. In contrast to all this power, society remains mostly ignorant of the capabilities and standard practices of AI today. To address this imbalance, improving current interactions between people and AI systems, we propose a transparency scheme to be implemented on any AI system open to the public. The scheme is based on two pillars: Data Privacy and AI Transparency. The first recognizes the relevance of data for AI, and is supported by GDPR. The second considers aspects of AI transparency currently unregulated: AI capabilities, purpose and source. We design this pillar based on ethical principles. For each of the two pillars, we define a three-level display. The first level is based on visual signs, inspired by traffic signs managing the interaction between people and cars, and designed for quick and universal interpretability. The second level uses factsheets, providing limited details. The last level provides access to all available information. After detailing and exemplifying the proposed transparency scheme, we define a set of principles for creating transparent by design software, to be used during the integration of AI components on user-oriented services.

preprint2021arXiv

DOME: Recommendations for supervised machine learning validation in biology

Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations. Here we present a set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology. Adopting a structured methods description for machine learning based on data, optimization, model, evaluation (DOME) will aim to help both reviewers and readers to better understand and assess the performance and limitations of a method or outcome. The recommendations are formulated as questions to anyone wishing to pursue implementation of a machine learning algorithm. Answers to these questions can be easily included in the supplementary material of published papers.

preprint2021arXiv

The Impact of COVID-19 on Flight Networks

As COVID-19 transmissions spread worldwide, governments have announced and enforced travel restrictions to prevent further infections. Such restrictions have a direct effect on the volume of international flights among these countries, resulting in extensive social and economic costs. To better understand the situation in a quantitative manner, we used the Opensky network data to clarify flight patterns and flight densities around the world and observe relationships between flight numbers with new infections, and with the economy (unemployment rate) in Barcelona. We found that the number of daily flights gradually decreased and suddenly dropped 64% during the second half of March in 2020 after the US and Europe enacted travel restrictions. We also observed a 51% decrease in the global flight network density decreased during this period. Regarding new COVID-19 cases, the world had an unexpected surge regardless of travel restrictions. Finally, the layoffs for temporary workers in the tourism and airplane business increased by 4.3 fold in the weeks following Spain's decision to close its borders.

preprint2020arXiv

Obstruction level detection of sewer videos using convolutional neural networks

Worldwide, sewer networks are designed to transport wastewater to a centralized treatment plant to be treated and returned to the environment. This process is critical for the current society, preventing waterborne illnesses, providing safe drinking water and enhancing general sanitation. To keep a sewer network perfectly operational, sampling inspections are performed constantly to identify obstructions. Typically, a Closed-Circuit Television system is used to record the inside of pipes and report the obstruction level, which may trigger a cleaning operative. Currently, the obstruction level assessment is done manually, which is time-consuming and inconsistent. In this work, we design a methodology to train a Convolutional Neural Network for identifying the level of obstruction in pipes, thus reducing the human effort required on such a frequent and repetitive task. We gathered a database of videos that are explored and adapted to generate useful frames to fed into the model. Our resulting classifier obtains deployment ready performances. To validate the consistency of the approach and its industrial applicability, we integrate the Layer-wise Relevance Propagation explainability technique, which enables us to further understand the behavior of the neural network for this task. In the end, the proposed system can provide higher speed, accuracy, and consistency in the process of sewer examination. Our analysis also uncovers some guidelines on how to further improve the quality of the data gathering methodology.

preprint2020arXiv

Private Sources of Mobility Data Under COVID-19

The COVID-19 pandemic is changing the world in unprecedented and unpredictable ways. Human mobility is at the epicenter of that change, as the greatest facilitator for the spread of the virus. To study the change in mobility, to evaluate the efficiency of mobility restriction policies, and to facilitate a better response to possible future crisis, we need to properly understand all mobility data sources at our disposal. Our work is dedicated to the study of private mobility sources, gathered and released by large technological companies. This data is of special interest because, unlike most public sources, it is focused on people, not transportation means. i.e., its unit of measurement is the closest thing to a person in a western society: a phone. Furthermore, the sample of society they cover is large and representative. On the other hand, this sort of data is not directly accessible for anonymity reasons. Thus, properly interpreting its patterns demands caution. Aware of that, we set forth to explore the behavior and inter-relations of private sources of mobility data in the context of Spain. This country represents a good experimental setting because of its large and fast pandemic peak, and for its implementation of a sustained, generalized lockdown. We find private mobility sources to be both correlated and complementary. Using them, we evaluate the efficiency of implemented policies, and provide a insights into what new normal means in Spain.

preprint2020arXiv

What are We Depressed about When We Talk about COVID19: Mental Health Analysis on Tweets Using Natural Language Processing

The outbreak of coronavirus disease 2019 (COVID-19) recently has affected human life to a great extent. Besides direct physical and economic threats, the pandemic also indirectly impact people's mental health conditions, which can be overwhelming but difficult to measure. The problem may come from various reasons such as unemployment status, stay-at-home policy, fear for the virus, and so forth. In this work, we focus on applying natural language processing (NLP) techniques to analyze tweets in terms of mental health. We trained deep models that classify each tweet into the following emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust. We build the EmoCT (Emotion-Covid19-Tweet) dataset for the training purpose by manually labeling 1,000 English tweets. Furthermore, we propose and compare two methods to find out the reasons that are causing sadness and fear.