Source author record

Goran Nenadic

Goran Nenadic appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence

Catalog footprint

What is connected

4works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A Framework for Evaluation of Machine Reading Comprehension Gold Standards

Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.

preprint2020arXiv

Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models

Recent years have seen a growing number of publications that analyse Natural Language Inference (NLI) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. We summarise and discuss the findings and conclude with a set of recommendations for possible future research directions. We hope it will be a useful resource for researchers who propose new datasets, to have a set of tools to assess the suitability and quality of their data to evaluate various phenomena of interest, as well as those who develop novel architectures, to further understand the implications of their improvements with respect to their model's acquired capabilities.

preprint2016arXiv

As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

Similes are natural language expressions used to compare unlikely things, where the comparison is not taken literally. They are often used in everyday communication and are an important part of cultural heritage. Having an up-to-date corpus of similes is challenging, as they are constantly coined and/or adapted to the contemporary times. In this paper we present a methodology for semi-automated collection of similes from the world wide web using text mining techniques. We expanded an existing corpus of traditional similes (containing 333 similes) by collecting 446 additional expressions. We, also, explore how crowdsourcing can be used to extract and curate new similes.

preprint2013arXiv

ManTIME: Temporal expression identification and normalization in the TempEval-3 challenge

This paper describes a temporal expression identification and normalization system, ManTIME, developed for the TempEval-3 challenge. The identification phase combines the use of conditional random fields along with a post-processing identification pipeline, whereas the normalization phase is carried out using NorMA, an open-source rule-based temporal normalizer. We investigate the performance variation with respect to different feature types. Specifically, we show that the use of WordNet-based features in the identification task negatively affects the overall performance, and that there is no statistically significant difference in using gazetteers, shallow parsing and propositional noun phrases labels on top of the morphological features. On the test data, the best run achieved 0.95 (P), 0.85 (R) and 0.90 (F1) in the identification phase. Normalization accuracies are 0.84 (type attribute) and 0.77 (value attribute). Surprisingly, the use of the silver data (alone or in addition to the gold annotated ones) does not improve the performance.