Source author record

German Rigau

German Rigau appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Logic in Computer Science cmp-lg

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models

In this paper we present a system that exploits different pre-trained Language Models for assigning domain labels to WordNet synsets without any kind of supervision. Furthermore, the system is not restricted to use a particular set of domain labels. We exploit the knowledge encoded within different off-the-shelf pre-trained Language Models and task formulations to infer the domain label of a particular WordNet definition. The proposed zero-shot system achieves a new state-of-the-art on the English dataset used in the evaluation.

preprint2021arXiv

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Popular social media networks provide the perfect environment to study the opinions and attitudes expressed by users. While interactions in social media such as Twitter occur in many natural languages, research on stance detection (the position or attitude expressed with respect to a specific topic) within the Natural Language Processing field has largely been done for English. Although some efforts have recently been made to develop annotated data in other languages, there is a telling lack of resources to facilitate multilingual and crosslingual research on stance detection. This is partially due to the fact that manually annotating a corpus of social media texts is a difficult, slow and costly process. Furthermore, as stance is a highly domain- and topic-specific phenomenon, the need for annotated data is specially demanding. As a result, most of the manually labeled resources are hindered by their relatively small size and skewed class distribution. This paper presents a method to obtain multilingual datasets for stance detection in Twitter. Instead of manually annotating on a per tweet basis, we leverage user-based information to semi-automatically label large amounts of tweets. Empirical monolingual and cross-lingual experimentation and qualitative analysis show that our method helps to overcome the aforementioned difficulties to build large, balanced and multilingual labeled corpora. We believe that our method can be easily adapted to easily generate labeled social media data for other Natural Language Processing tasks and domains.

preprint2020arXiv

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning

Most commonly, the Open World Assumption is adopted as a standard strategy for the design, construction and use of ontologies. This strategy limits the inferencing capabilities of any system because non-asserted statements (missing knowledge) could be assumed to be alternatively true or false. As we will demonstrate, this is especially the case of first-order logic (FOL) ontologies where non-asserted statements is nowadays one of the main obstacles to its practical application in automated commonsense reasoning tasks. In this paper, we investigate the application of the Closed World Assumption (CWA) to enable a better exploitation of FOL ontologies by using state-of-the-art automated theorem provers. To that end, we explore different CWA formulations for the structural knowledge encoded in a FOL translation of the SUMO ontology, discovering that almost 30 % of the structural knowledge is missing. We evaluate these formulations on a practical experimentation using a very large commonsense benchmark obtained from WordNet through its mapping to SUMO. The results show that the competency of the ontology improves more than 50 % when reasoning under the CWA. Thus, applying the CWA automatically to FOL ontologies reduces their ambiguity and more commonsense questions can be answered

preprint2020arXiv

Multilingual Stance Detection: The Catalonia Independence Corpus

Stance detection aims to determine the attitude of a given text with respect to a specific topic or claim. While stance detection has been fairly well researched in the last years, most the work has been focused on English. This is mainly due to the relative lack of annotated data in other languages. The TW-10 Referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish. Unfortunately, the TW-10 Catalan subset is extremely imbalanced. This paper addresses these issues by presenting a new multilingual dataset for stance detection in Twitter for the Catalan and Spanish languages, with the aim of facilitating research on stance detection in multilingual and cross-lingual settings. The dataset is annotated with stance towards one topic, namely, the independence of Catalonia. We also provide a semi-automatic method to annotate the dataset based on a categorization of Twitter users. We experiment on the new corpus with a number of supervised approaches, including linear classifiers and deep learning methods. Comparison of our new corpus with the with the TW-1O dataset shows both the benefits and potential of a well balanced corpus for multilingual and cross-lingual research on stance detection. Finally, we establish new state-of-the-art results on the TW-10 dataset, both for Catalan and Spanish.

preprint2020arXiv

NUBES: A Corpus of Negation and Uncertainty in Spanish Clinical Texts

This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish). The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty. The article includes an exhaustive comparison with similar corpora in Spanish, and presents the main annotation and design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate the annotated dataset. As far as we know, NUBes is the largest publicly available corpus for negation in Spanish and the first that also incorporates the annotation of speculation cues, scopes, and events.

preprint2015arXiv

Evaluating the Competency of a First-Order Ontology

We report on the results of evaluating the competency of a first-order ontology for its use with automated theorem provers (ATPs). The evaluation follows the adaptation of the methodology based on competency questions (CQs) [Grüninger&Fox,1995] to the framework of first-order logic, which is presented in [Álvez&Lucio&Rigau,2015], and is applied to Adimen-SUMO [Álvez&Lucio&Rigau,2015]. The set of CQs used for this evaluation has been automatically generated from a small set of semantic patterns and the mapping of WordNet to SUMO. Analysing the results, we can conclude that it is feasible to use ATPs for working with Adimen-SUMO v2.4, enabling the resolution of goals by means of performing non-trivial inferences.

preprint2015arXiv

Improving the Competency of First-Order Ontologies

We introduce a new framework to evaluate and improve first-order (FO) ontologies using automated theorem provers (ATPs) on the basis of competency questions (CQs). Our framework includes both the adaptation of a methodology for evaluating ontologies to the framework of first-order logic and a new set of non-trivial CQs designed to evaluate FO versions of SUMO, which significantly extends the very small set of CQs proposed in the literature. Most of these new CQs have been automatically generated from a small set of patterns and the mapping of WordNet to SUMO. Applying our framework, we demonstrate that Adimen-SUMO v2.2 outperforms TPTP-SUMO. In addition, using the feedback provided by ATPs we have set an improved version of Adimen-SUMO (v2.4). This new version outperforms the previous ones in terms of competency. For instance, "Humans can reason" is automatically inferred from Adimen-SUMO v2.4, while it is neither deducible from TPTP-SUMO nor Adimen-SUMO v2.2.

preprint1995arXiv

Disambiguating bilingual nominal entries against WordNet

This paper explores the acquisition of conceptual knowledge from bilingual dictionaries (French/English, Spanish/English and English/Spanish) using a pre-existing broad coverage Lexical Knowledge Base (LKB) WordNet. Bilingual nominal entries are disambiguated agains WordNet, therefore linking the bilingual dictionaries to WordNet yielding a multilingual LKB (MLKB). The resulting MLKB has the same structure as WordNet, but some nodes are attached additionally to disambiguated vocabulary of other languages. Two different, complementary approaches are explored. In one of the approaches each entry of the dictionary is taken in turn, exploiting the information in the entry itself. The inferential capability for disambiguating the translation is given by Semantic Density over WordNet. In the other approach, the bilingual dictionary was merged with WordNet, exploiting mainly synonymy relations. Each of the approaches was used in a different dictionary. Both approaches attain high levels of precision on their own, showing that disambiguating bilingual nominal entries, and therefore linking bilingual dictionaries to WordNet is a feasible task.

German Rigau

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning

Multilingual Stance Detection: The Catalonia Independence Corpus

NUBES: A Corpus of Negation and Uncertainty in Spanish Clinical Texts

Evaluating the Competency of a First-Order Ontology

Improving the Competency of First-Order Ontologies

Disambiguating bilingual nominal entries against WordNet