Researcher profile

Denis Parra

Denis Parra contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity

Large language models (LLMs) have revolutionized natural language processing. Understanding their internal mechanisms is crucial for developing more interpretable and optimized architectures. Mechanistic interpretability has led to the development of various methods for assessing layer relevance, with cosine similarity being a widely used tool in the field. On this work, we demonstrate that cosine similarity is a poor proxy for the actual performance degradation caused by layer removal. Our theoretical analysis shows that a layer can exhibit an arbitrarily low cosine similarity score while still being crucial to the model's performance. On the other hand, empirical evidence from a range of LLMs confirms that the correlation between cosine similarity and actual performance degradation is often weak or moderate, leading to misleading interpretations of a transformer's internal mechanisms. We propose a more robust metric for assessing layer relevance: the actual drop in model accuracy resulting from the removal of a layer. Even though it is a computationally costly metric, this approach offers a more accurate picture of layer importance, allowing for more informed pruning strategies and lightweight models. Our findings have significant implications for the development of interpretable LLMs and highlight the need to move beyond cosine similarity in assessing layer relevance.

preprint2022arXiv

A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images

Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.

preprint2022arXiv

Graphing else matters: exploiting aspect opinions and ratings in explainable graph-based recommendations

The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, current recommendation methods based on graph embeddings have shown state-of-the-art performance. These methods commonly encode latent rating patterns and content features. Different from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Our approach has the advantage of providing explanations which leverage aspect-based opinions given by users about recommended items. Furthermore, we also provide examples of the applicability of recommendations utilizing aspect opinions as explanations in a visualization dashboard, which allows obtaining information about the most and least liked aspects of similar users obtained from the embeddings of an input graph.

preprint2022arXiv

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Several deep learning architectures have been proposed over the last years to deal with the problem of generating a written report given an imaging exam as input. Most works evaluate the generated reports using standard Natural Language Processing (NLP) metrics (e.g. BLEU, ROUGE), reporting significant progress. In this article, we contrast this progress by comparing state of the art (SOTA) models against weak baselines. We show that simple and even naive approaches yield near SOTA performance on most traditional NLP metrics. We conclude that evaluation methods in this task should be further studied towards correctly measuring clinical accuracy, ideally involving physicians to contribute to this end.

preprint2020arXiv

Analyzing Network Effects on a Fanfiction Community

Since the early days of the Web 2.0, online communities have been growing quickly and have become important part of life for large number of people. In one of these communities, fanfiction.net, users can read and write stories which are adapted, recreated and modified from original famous books, tv series, movies, among others. By following stories and their authors, the fanfiction community creates a social network. Previous research on online communities has shown how features of the social network can help explain the behavior of the community, so we are interested in studying fanfiction's social network as well as its influence in aspects of the community. In particular, in this article we describe several properties of the members of the community, and we also try to discover which factors explain the popularity of the authors. We discover that time since joining fanfiction and the size of the authors' biography, has a negative effect on the authors popularity. Moreover, we show that the users' network metrics help to explain better authors' popularity.

preprint2020arXiv

Interpretable Contextual Team-aware Item Recommendation: Application in Multiplayer Online Battle Arena Games

The video game industry has adopted recommendation systems to boost users interest with a focus on game sales. Other exciting applications within video games are those that help the player make decisions that would maximize their playing experience, which is a desirable feature in real-time strategy video games such as Multiplayer Online Battle Arena (MOBA) like as DotA and LoL. Among these tasks, the recommendation of items is challenging, given both the contextual nature of the game and how it exposes the dependence on the formation of each team. Existing works on this topic do not take advantage of all the available contextual match data and dismiss potentially valuable information. To address this problem we develop TTIR, a contextual recommender model derived from the Transformer neural architecture that suggests a set of items to every team member, based on the contexts of teams and roles that describe the match. TTIR outperforms several approaches and provides interpretable recommendations through visualization of attention weights. Our evaluation indicates that both the Transformer architecture and the contextual information are essential to get the best results for this item recommendation task. Furthermore, a preliminary user survey indicates the usefulness of attention weights for explaining recommendations as well as ideas for future work. The code and dataset are available at: https://github.com/ojedaf/IC-TIR-Lol.

preprint2020arXiv

On Adversarial Examples for Biomedical NLP Tasks

The success of pre-trained word embeddings has motivated its use in tasks in the biomedical domain. The BERT language model has shown remarkable results on standard performance metrics in tasks such as Named Entity Recognition (NER) and Semantic Textual Similarity (STS), which has brought significant progress in the field of NLP. However, it is unclear whether these systems work seemingly well in critical domains, such as legal or medical. For that reason, in this work, we propose an adversarial evaluation scheme on two well-known datasets for medical NER and STS. We propose two types of attacks inspired by natural spelling errors and typos made by humans. We also propose another type of attack that uses synonyms of medical terms. Under these adversarial settings, the accuracy of the models drops significantly, and we quantify the extent of this performance loss. We also show that we can significantly improve the robustness of the models by training them with adversarial examples. We hope our work will motivate the use of adversarial examples to evaluate and develop models with increased robustness for medical tasks.