Researcher profile

Maria-Esther Vidal

Maria-Esther Vidal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation

Generating Knowledge Graphs (KGs) remains one of the most time-consuming and labor-intensive tasks for knowledge engineers, as they need to identify semantic equivalences between input data sources and ontology terms. While declarative solutions (e.g., RML, SPARQL-Anything) have helped to generalize this process, aligning input schema elements with ontology terms still involves intricate transformations and requires considerable manual effort. With the advent of Large Language Models (LLMs), there is growing interest in leveraging their capabilities to assist KG engineers. Although some studies have explored using LLMs to automate KG construction, there is still no standardized framework for assessing how effectively they establish correspondences between data schemes and ontology concepts. Therefore, in this paper, we propose BLINKG, a benchmark designed to evaluate the mapping capabilities of LLMs in constructing KGs from heterogeneous data sources. The benchmark includes a set of scenarios with increasing complexity, based on real-world use cases. We conduct an extensive experimental evaluation of several stateof-the-art LLMs using BLINK and observe that they already offer promising solutions. However, their performance remains limited in complex scenarios. Thanks to this benchmark, we can already assess the current capabilities of LLMs for KG construction. Additionally, we define a set of requirements for achieving (semi)automated (LLM-driven) KG construction, opening new research lines in this area.

preprint2022arXiv

Efficient Semantic Summary Graphs for Querying Large Knowledge Graphs

Knowledge Graphs (KGs) integrate heterogeneous data, but one challenge is the development of efficient tools for allowing end users to extract useful insights from these sources of knowledge. In such a context, reducing the size of a Resource Description Framework (RDF) graph while preserving all information can speed up query engines by limiting data shuffle, especially in a distributed setting. This paper presents two algorithms for RDF graph summarization: Grouping Based Summarization (GBS) and Query Based Summarization (QBS). The latter is an optimized and lossless approach for the former method. We empirically study the effectiveness of the proposed lossless RDF graph summarization to retrieve complete data, by rewriting an RDF Query Language called SPARQL query with fewer triple patterns using a semantic similarity. We conduct our experimental study in instances of four datasets with different sizes. Compared with the state-of-the-art query engine Sparklify executed over the original RDF graphs as a baseline, QBS query execution time is reduced by up to 80% and the summarized RDF graph is decreased by up to 99%.

preprint2021arXiv

Trav-SHACL: Efficiently Validating Networks of SHACL Constraints

Knowledge graphs have emerged as expressive data structures for Web data. Knowledge graph potential and the demand for ecosystems to facilitate their creation, curation, and understanding, is testified in diverse domains, e.g., biomedicine. The Shapes Constraint Language (SHACL) is the W3C recommendation language for integrity constraints over RDF knowledge graphs. Enabling quality assements of knowledge graphs, SHACL is rapidly gaining attention in real-world scenarios. SHACL models integrity constraints as a network of shapes, where a shape contains the constraints to be fullfiled by the same entities. The validation of a SHACL shape schema can face the issue of tractability during validation. To facilitate full adoption, efficient computational methods are required. We present Trav-SHACL, a SHACL engine capable of planning the traversal and execution of a shape schema in a way that invalid entities are detected early and needless validations are minimized. Trav-SHACL reorders the shapes in a shape schema for efficient validation and rewrites target and constraint queries for the fast detection of invalid entities. Trav-SHACL is empirically evaluated on 27 testbeds executed against knowledge graphs of up to 34M triples. Our experimental results suggest that Trav-SHACL exhibits high performance gradually and reduces validation time by a factor of up to 28.93 compared to the state of the art.

preprint2020arXiv

Bias in Data-driven AI Systems -- An Introductory Survey

AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multi-disciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful Machine Learning (ML) algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features like race, sex, etc.

preprint2020arXiv

Compacting Frequent Star Patterns in RDF Graphs

Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top of gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude reducing the RDF graph size by up to 66.56%.

preprint2020arXiv

Falcon 2.0: An Entity and Relation Linking Tool over Wikidata

The Natural Language Processing (NLP) community has significantly contributed to the solutions for entity and relation recognition from the text, and possibly linking them to proper matches in Knowledge Graphs (KGs). Considering Wikidata as the background KG, still, there are limited tools to link knowledge within the text to Wikidata. In this paper, we present Falcon 2.0, first joint entity, and relation linking tool over Wikidata. It receives a short natural language text in the English language and outputs a ranked list of entities and relations annotated with the proper candidates in Wikidata. The candidates are represented by their Internationalized Resource Identifier (IRI) in Wikidata. Falcon 2.0 resorts to the English language model for the recognition task (e.g., N-Gram tiling and N-Gram splitting), and then an optimization approach for linking task. We have empirically studied the performance of Falcon 2.0 on Wikidata and concluded that it outperforms all the existing baselines. Falcon 2.0 is public and can be reused by the community; all the required instructions of Falcon 2.0 are well-documented at our GitHub repository. We also demonstrate an online API, which can be run without any technical expertise. Falcon 2.0 and its background knowledge bases are available as resources at https://labs.tib.eu/falcon/falcon2/.

preprint2020arXiv

No One is Perfect: Analysing the Performance of Question Answering Components over the DBpedia Knowledge Graph

Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting and most approaches currently use a pipeline of processing steps connecting a sequence of components. In this article, we analyse and micro evaluate the behaviour of 29 available QA components for DBpedia knowledge graph that were released by the research community since 2010. As a result, we provide a perspective on collective failure cases, suggest characteristics of QA components that prevent them from performing better and provide future challenges and research directions for the field.

preprint2020arXiv

Optimizing Federated Queries Based on the Physical Design of a Data Lake

The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using source-specific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.

preprint2020arXiv

Unveiling Relations in the Industry 4.0 Standards Landscape based on Knowledge Graph Embeddings

Industry~4.0 (I4.0) standards and standardization frameworks have been proposed with the goal of \emph{empowering interoperability} in smart factories. These standards enable the description and interaction of the main components, systems, and processes inside of a smart factory. Due to the growing number of frameworks and standards, there is an increasing need for approaches that automatically analyze the landscape of I4.0 standards. Standardization frameworks classify standards according to their functions into layers and dimensions. However, similar standards can be classified differently across the frameworks, producing, thus, interoperability conflicts among them. Semantic-based approaches that rely on ontologies and knowledge graphs, have been proposed to represent standards, known relations among them, as well as their classification according to existing frameworks. Albeit informative, the structured modeling of the I4.0 landscape only provides the foundations for detecting interoperability issues. Thus, graph-based analytical methods able to exploit knowledge encoded by these approaches, are required to uncover alignments among standards. We study the relatedness among standards and frameworks based on community analysis to discover knowledge that helps to cope with interoperability conflicts between standards. We use knowledge graph embeddings to automatically create these communities exploiting the meaning of the existing relationships. In particular, we focus on the identification of similar standards, i.e., communities of standards, and analyze their properties to detect unknown relations. We empirically evaluate our approach on a knowledge graph of I4.0 standards using the Trans$^*$ family of embedding models for knowledge graph entities. Our results are promising and suggest that relations among standards can be detected accurately.