Source author record

Elena Simperl

Elena Simperl appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language cs.CY Databases Information Retrieval Machine Learning physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs

Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversational question answering. This paper evaluates the performance of large language models that have not been explicitly pre-trained on this task. Through a series of experiments on an extensive benchmark dataset, we compare models of varying sizes with different prompting techniques and identify common issue types in the generated output. Our results demonstrate that large language models are capable of generating graph queries from dialogues, with significant improvements achievable through few-shot prompting and fine-tuning techniques, especially for smaller models that exhibit lower zero-shot performance.

preprint2022arXiv

Statistical and Neural Methods for Cross-lingual Entity Label Mapping in Knowledge Graphs

Knowledge bases such as Wikidata amass vast amounts of named entity information, such as multilingual labels, which can be extremely useful for various multilingual and cross-lingual applications. However, such labels are not guaranteed to match across languages from an information consistency standpoint, greatly compromising their usefulness for fields such as machine translation. In this work, we investigate the application of word and sentence alignment techniques coupled with a matching algorithm to align cross-lingual entity labels extracted from Wikidata in 10 languages. Our results indicate that mapping between Wikidata's main labels stands to be considerably improved (up to $20$ points in F1-score) by any of the employed methods. We show how methods relying on sentence embeddings outperform all others, even across different scripts. We believe the application of such techniques to measure the similarity of label pairs, coupled with a knowledge base rich in high-quality entity labels, to be an excellent asset to machine translation.

preprint2022arXiv

WDV: A Broad Data Verbalisation Dataset Built from Wikidata

Data verbalisation is a task of great importance in the current field of natural language processing, as there is great benefit in the transformation of our abundant structured and semi-structured data into human-readable formats. Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text. Although KG verbalisation datasets exist for some KGs, there are still gaps in their fitness for use in many scenarios. This is especially true for Wikidata, where available datasets either loosely couple claim sets with textual information or heavily focus on predicates around biographies, cities, and countries. To address these gaps, we propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text, covering a wide variety of entities and predicates. We also evaluate the quality of our verbalisations through a reusable workflow for measuring human-centred fluency and adequacy scores. Our data and code are openly available in the hopes of furthering research towards KG verbalisation.

preprint2015arXiv

RDF-Hunter: Automatically Crowdsourcing the Execution of Queries Against RDF Data Sets

In the last years, a large number of RDF data sets has become available on the Web. However, due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against this data. To overcome this limitation, we propose RDF-Hunter, a novel hybrid query processing approach that brings together machine and human computation to execute queries against RDF data. We develop a novel quality model and query engine in order to enable RDF-Hunter to on the fly decide which parts of a query should be executed through conventional technology or crowd computing. To evaluate RDF-Hunter, we created a collection of 50 SPARQL queries against the DBpedia data set, executed them using our hybrid query engine, and analyzed the accuracy of the outcomes obtained from the crowd. The experiments clearly show that the overall approach is feasible and produces query results that reliably and significantly enhance completeness of automatic query processing responses.

preprint2014arXiv

Collective Intelligence in Citizen Science -- A Study of Performers and Talkers

The recent emergence of online citizen science is illustrative of an efficient and effective means to harness the crowd in order to achieve a range of scientific discoveries. Fundamentally, citizen science projects draw upon crowds of non-expert volunteers to complete short Tasks, which can vary in domain and complexity. However, unlike most human-computational systems, participants in these systems, the `citizen scientists' are volunteers, whereby no incentives, financial or otherwise, are offered. Furthermore, encouraged by citizen science platforms such as Zooniverse, online communities have emerged, providing them with an environment to discuss, share ideas, and solve problems. In fact, it is the result of these forums that has enabled a number of scientific discoveries to be made. In this paper we explore the phenomenon of collective intelligence via the relationship between the activities of online citizen science communities and the discovery of scientific knowledge. We perform a cross-project analysis of ten Zooniverse citizen science projects and analyse the behaviour of users with regards to their Task completion activity and participation in discussion and discover collective behaviour amongst highly active users. Whilst our findings have implications for future citizen science design, we also consider the wider implications for understanding collective intelligence research in general.