Researcher profile

Matthias Samwald

Matthias Samwald contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

A global analysis of metrics used for measuring performance in natural language processing

Measuring the performance of natural language processing models is challenging. Traditionally used metrics, such as BLEU and ROUGE, originally devised for machine translation and summarization, have been shown to suffer from low correlation with human judgment and a lack of transferability to other tasks and languages. In the past 15 years, a wide range of alternative metrics have been proposed. However, it is unclear to what extent this has had an impact on NLP benchmarking efforts. Here we provide the first large-scale cross-sectional analysis of metrics used for measuring performance in natural language processing. We curated, mapped and systematized more than 3500 machine learning model performance results from the open repository 'Papers with Code' to enable a global and comprehensive analysis. Our results suggest that the large majority of natural language processing metrics currently used have properties that may result in an inadequate reflection of a models' performance. Furthermore, we found that ambiguities and inconsistencies in the reporting of metrics may lead to difficulties in interpreting and comparing model performances, impairing transparency and reproducibility in NLP research.

preprint2022arXiv

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the benefits of meta-dataset curation. While successful in general-domain text, translating these data-centric approaches to biomedical language modeling remains challenging, as labeled biomedical datasets are significantly underrepresented in popular data hubs. To address this challenge, we introduce BigBIO a community library of 126+ biomedical NLP datasets, currently covering 12 task categories and 10+ languages. BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata, and is compatible with current platforms for prompt engineering and end-to-end few/zero shot language model evaluation. We discuss our process for task schema harmonization, data auditing, contribution guidelines, and outline two illustrative use cases: zero-shot evaluation of biomedical prompts and large-scale, multi-task learning. BigBIO is an ongoing community effort and is available at https://github.com/bigscience-workshop/biomedical

preprint2022arXiv

Deep Learning, Natural Language Processing, and Explainable Artificial Intelligence in the Biomedical Domain

In this article, we first give an introduction to artificial intelligence and its applications in biology and medicine in Section 1. Deep learning methods are then described in Section 2. We narrow down the focus of the study on textual data in Section 3, where natural language processing and its applications in the biomedical domain are described. In Section 4, we give an introduction to explainable artificial intelligence and discuss the importance of explainability of artificial intelligence systems, especially in the biomedical domain.

preprint2022arXiv

GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain

Deep neural language models have set new breakthroughs in many tasks of Natural Language Processing (NLP). Recent work has shown that deep transformer language models (pretrained on large amounts of texts) can achieve high levels of task-specific few-shot performance comparable to state-of-the-art models. However, the ability of these large language models in few-shot transfer learning has not yet been explored in the biomedical domain. We investigated the performance of two powerful transformer language models, i.e. GPT-3 and BioBERT, in few-shot settings on various biomedical NLP tasks. The experimental results showed that, to a great extent, both the models underperform a language model fine-tuned on the full training data. Although GPT-3 had already achieved near state-of-the-art results in few-shot knowledge transfer on open-domain NLP tasks, it could not perform as effectively as BioBERT, which is orders of magnitude smaller than GPT-3. Regarding that BioBERT was already pretrained on large biomedical text corpora, our study suggests that language models may largely benefit from in-domain pretraining in task-specific few-shot learning. However, in-domain pretraining seems not to be sufficient; novel pretraining and few-shot learning strategies are required in the biomedical NLP domain.

preprint2020arXiv

Benchmarking neural embeddings for link prediction in knowledge graphs under semantic and structural changes

Recently, link prediction algorithms based on neural embeddings have gained tremendous popularity in the Semantic Web community, and are extensively used for knowledge graph completion. While algorithmic advances have strongly focused on efficient ways of learning embeddings, fewer attention has been drawn to the different ways their performance and robustness can be evaluated. In this work we propose an open-source evaluation pipeline, which benchmarks the accuracy of neural embeddings in situations where knowledge graphs may experience semantic and structural changes. We define relation-centric connectivity measures that allow us to connect the link prediction capacity to the structure of the knowledge graph. Such an evaluation pipeline is especially important to simulate the accuracy of embeddings for knowledge graphs that are expected to be frequently updated.

preprint2020arXiv

Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-based Modules

Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In this paper we present an approach that combines a neural embedding model and logic-based modules to accurately divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluation Initiative. The results are encouraging and suggest that the proposed method is adequate in practice and can be integrated within the workflow of systems unable to cope with very large ontologies.

preprint2020arXiv

Explaining Black-box Models for Biomedical Text Classification

In this paper, we propose a novel method named Biomedical Confident Itemsets Explanation (BioCIE), aiming at post-hoc explanation of black-box machine learning models for biomedical text classification. Using sources of domain knowledge and a confident itemset mining method, BioCIE discretizes the decision space of a black-box into smaller subspaces and extracts semantic relationships between the input text and class labels in different subspaces. Confident itemsets discover how biomedical concepts are related to class labels in the black-box's decision space. BioCIE uses the itemsets to approximate the black-box's behavior for individual predictions. Optimizing fidelity, interpretability, and coverage measures, BioCIE produces class-wise explanations that represent decision boundaries of the black-box. Results of evaluations on various biomedical text classification tasks and black-box models demonstrated that BioCIE can outperform perturbation-based and decision set methods in terms of producing concise, accurate, and interpretable explanations. BioCIE improved the fidelity of instance-wise and class-wise explanations by 11.6% and 7.5%, respectively. It also improved the interpretability of explanations by 8%. BioCIE can be effectively used to explain how a black-box biomedical text classification model semantically relates input texts to class labels. The source code and supplementary material are available at https://github.com/mmoradi-iut/BioCIE.

preprint2020arXiv

OpenBioLink: A benchmarking framework for large-scale biomedical link prediction

SUMMARY: Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. AVAILABILITY AND IMPLEMENTATION: Source code, data and supplementary files are openly available at https://github.com/OpenBioLink/OpenBioLink CONTACT: matthias.samwald ((at)) meduniwien.ac.at

preprint2020arXiv

Post-hoc explanation of black-box classifiers using confident itemsets

Black-box Artificial Intelligence (AI) methods, e.g. deep neural networks, have been widely utilized to build predictive models that can extract complex relationships in a dataset and make predictions for new unseen data records. However, it is difficult to trust decisions made by such methods since their inner working and decision logic is hidden from the user. Explainable Artificial Intelligence (XAI) refers to systems that try to explain how a black-box AI model produces its outcomes. Post-hoc XAI methods approximate the behavior of a black-box by extracting relationships between feature values and the predictions. Perturbation-based and decision set methods are among commonly used post-hoc XAI systems. The former explanators rely on random perturbations of data records to build local or global linear models that explain individual predictions or the whole model. The latter explanators use those feature values that appear more frequently to construct a set of decision rules that produces the same outcomes as the target black-box. However, these two classes of XAI methods have some limitations. Random perturbations do not take into account the distribution of feature values in different subspaces, leading to misleading approximations. Decision sets only pay attention to frequent feature values and miss many important correlations between features and class labels that appear less frequently but accurately represent decision boundaries of the model. In this paper, we address the above challenges by proposing an explanation method named Confident Itemsets Explanation (CIE). We introduce confident itemsets, a set of feature values that are highly correlated to a specific class label. CIE utilizes confident itemsets to discretize the whole decision space of a model to smaller subspaces.