Source author record

Nicolas Heist

Nicolas Heist appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Information Retrieval Computation and Language Databases

Catalog footprint

What is connected

4works

4topics

3close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The CaLiGraph Ontology as a Challenge for OWL Reasoners

CaLiGraph is a large-scale cross-domain knowledge graph generated from Wikipedia by exploiting the category system, list pages, and other list structures in Wikipedia, containing more than 15 million typed entities and around 10 million relation assertions. Other than knowledge graphs such as DBpedia and YAGO, whose ontologies are comparably simplistic, CaLiGraph also has a rich ontology, comprising more than 200,000 class restrictions. Those two properties - a large A-box and a rich ontology - make it an interesting challenge for benchmarking reasoners. In this paper, we show that a reasoning task which is particularly relevant for CaLiGraph, i.e., the materialization of owl:hasValue constraints into assertions between individuals and between individuals and literals, is insufficiently supported by available reasoning systems. We provide differently sized benchmark subsets of CaLiGraph, which can be used for performance analysis of reasoning systems.

preprint2021arXiv

Information Extraction From Co-Occurring Similar Entities

Knowledge about entities and their interrelations is a crucial factor of success for tasks like question answering or text summarization. Publicly available knowledge graphs like Wikidata or DBpedia are, however, far from being complete. In this paper, we explore how information extracted from similar entities that co-occur in structures like tables or lists can help to increase the coverage of such knowledge graphs. In contrast to existing approaches, we do not focus on relationships within a listing (e.g., between two entities in a table row) but on the relationship between a listing's subject entities and the context of the listing. To that end, we propose a descriptive rule mining approach that uses distant supervision to derive rules for these relationships based on a listing's context. Extracted from a suitable data corpus, the rules can be used to extend a knowledge graph with novel entities and assertions. In our experiments we demonstrate that the approach is able to extract up to 3M novel entities and 30M additional assertions from listings in Wikipedia. We find that the extracted information is of high quality and thus suitable to extend Wikipedia-based knowledge graphs like DBpedia, YAGO, and CaLiGraph. For the case of DBpedia, this would result in an increase of covered entities by roughly 50%.

preprint2020arXiv

Entity Extraction from Wikipedia List Pages

When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia's policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia's list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.

preprint2020arXiv

Knowledge Graphs on the Web -- an Overview

Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowledge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chapter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap.

Nicolas Heist

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

The CaLiGraph Ontology as a Challenge for OWL Reasoners

Information Extraction From Co-Occurring Similar Entities

Entity Extraction from Wikipedia List Pages

Knowledge Graphs on the Web -- an Overview