Source author record

Robert Hoehndorf

Robert Hoehndorf appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Quantitative Methods Databases Information Retrieval Computation and Language Genomics Logic in Computer Science

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Semantic Units: Organizing knowledge graphs into semantically meaningful units of representation

Knowledge graphs and ontologies are becoming increasingly important as technical solutions for Findable, Accessible, Interoperable, and Reusable data and metadata (FAIR Guiding Principles). We discuss four challenges that impede the use of FAIR knowledge graphs and propose semantic units as their potential solution. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs. Each unit is represented by its own resource, instantiates a corresponding semantic unit class, and can be implemented as a FAIR Digital Object and a nanopublication in RDF/OWL and property graphs. We distinguish statement and compound units as basic categories of semantic units. Statement units represent smallest, independent propositions that are semantically meaningful for a human reader. They consist of one or more triples and mathematically partition a knowledge graph. We distinguish assertional, contingent (prototypical), and universal statement units as basic types of statement units and propose representational schemes and formal semantics for them (including for absence statements, negations, and cardinality restrictions) that do not involve blank nodes and that translate back to OWL. Compound units, on the other hand, represent semantically meaningful collections of semantic units and we distinguish various types of compound units, representing different levels of representational granularity, different types of granularity trees, and different frames of reference. Semantic units support making statements about statements, can be used for graph-alignment, subgraph-matching, knowledge graph profiling, and for managing access restrictions to sensitive data. Organizing the graph into semantic units supports the separation of ontological, diagnostic (i.e., referential), and discursive information, and it also supports the differentiation of multiple frames of reference.

preprint2022arXiv

Description Logic EL++ Embeddings with Intersectional Closure

Many ontologies, in particular in the biomedical domain, are based on the Description Logic EL++. Several efforts have been made to interpret and exploit EL++ ontologies by distributed representation learning. Specifically, concepts within EL++ theories have been represented as n-balls within an n-dimensional embedding space. However, the intersectional closure is not satisfied when using n-balls to represent concepts because the intersection of two n-balls is not an n-ball. This leads to challenges when measuring the distance between concepts and inferring equivalence between concepts. To this end, we developed EL Box Embedding (ELBE) to learn Description Logic EL++ embeddings using axis-parallel boxes. We generate specially designed box-based geometric constraints from EL++ axioms for model training. Since the intersection of boxes remains as a box, the intersectional closure is satisfied. We report extensive experimental results on three datasets and present a case study to demonstrate the effectiveness of the proposed method.

preprint2022arXiv

Positive-Unlabeled Learning with Adversarial Data Augmentation for Knowledge Graph Completion

Most real-world knowledge graphs (KG) are far from complete and comprehensive. This problem has motivated efforts in predicting the most plausible missing facts to complete a given KG, i.e., knowledge graph completion (KGC). However, existing KGC methods suffer from two main issues, 1) the false negative issue, i.e., the sampled negative training instances may include potential true facts; and 2) the data sparsity issue, i.e., true facts account for only a tiny part of all possible facts. To this end, we propose positive-unlabeled learning with adversarial data augmentation (PUDA) for KGC. In particular, PUDA tailors positive-unlabeled risk estimator for the KGC task to deal with the false negative issue. Furthermore, to address the data sparsity issue, PUDA achieves a data augmentation strategy by unifying adversarial training and positive-unlabeled learning under the positive-unlabeled minimax game. Extensive experimental results on real-world benchmark datasets demonstrate the effectiveness and compatibility of our proposed method.

preprint2022arXiv

TAR: Neural Logical Reasoning across TBox and ABox

Many ontologies, i.e., Description Logic (DL) knowledge bases, have been developed to provide rich knowledge about various domains. An ontology consists of an ABox, i.e., assertion axioms between two entities or between a concept and an entity, and a TBox, i.e., terminology axioms between two concepts. Neural logical reasoning (NLR) is a fundamental task to explore such knowledge bases, which aims at answering multi-hop queries with logical operations based on distributed representations of queries and answers. While previous NLR methods can give specific entity-level answers, i.e., ABox answers, they are not able to provide descriptive concept-level answers, i.e., TBox answers, where each concept is a description of a set of entities. In other words, previous NLR methods only reason over the ABox of an ontology while ignoring the TBox. In particular, providing TBox answers enables inferring the explanations of each query with descriptive concepts, which make answers comprehensible to users and are of great usefulness in the field of applied ontology. In this work, we formulate the problem of neural logical reasoning across TBox and ABox (TA-NLR), solving which needs to address challenges in incorporating, representing, and operating on concepts. We propose an original solution named TAR for TA-NLR. Firstly, we incorporate description logic based ontological axioms to provide the source of concepts. Then, we represent concepts and queries as fuzzy sets, i.e., sets whose elements have degrees of membership, to bridge concepts and queries with entities. Moreover, we design operators involving concepts on top of fuzzy set representation of concepts and queries for optimization and inference. Extensive experimental results on two real-world datasets demonstrate the effectiveness of TAR for TA-NLR.

preprint2020arXiv

Efficient long-distance relation extraction with DG-SpanBERT

In natural language processing, relation extraction seeks to rationally understand unstructured text. Here, we propose a novel SpanBERT-based graph convolutional network (DG-SpanBERT) that extracts semantic features from a raw sentence using the pre-trained language model SpanBERT and a graph convolutional network to pool latent features. Our DG-SpanBERT model inherits the advantage of SpanBERT on learning rich lexical features from large-scale corpus. It also has the ability to capture long-range relations between entities due to the usage of GCN on dependency tree. The experimental results show that our model outperforms other existing dependency-based and sequence-based models and achieves a state-of-the-art performance on the TACRED dataset.

preprint2016arXiv

Prediction of Metabolic Pathways Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.

preprint2014arXiv

Aber-OWL: a framework for ontology-based data access in biology

Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning. We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net.

preprint2014arXiv

Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases

Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text- mining approach to identify the phenotypes (signs and symptoms) associated with over 8,000 diseases. We demonstrate that our method generates phenotypes that correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that share signs and symptoms cluster together, and we use this network to identify phenotypic disease modules.

preprint2012arXiv

Towards quantitative measures in applied ontology

Applied ontology is a relatively new field which aims to apply theories and methods from diverse disciplines such as philosophy, cognitive science, linguistics and formal logics to perform or improve domain-specific tasks. To support the development of effective research methodologies for applied ontology, we critically discuss the question how its research results should be evaluated. We propose that results in applied ontology must be evaluated within their domain of application, based on some ontology-based task within the domain, and discuss quantitative measures which would facilitate the objective evaluation and comparison of research results in applied ontology.

Robert Hoehndorf

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Semantic Units: Organizing knowledge graphs into semantically meaningful units of representation

Description Logic EL++ Embeddings with Intersectional Closure

Positive-Unlabeled Learning with Adversarial Data Augmentation for Knowledge Graph Completion

TAR: Neural Logical Reasoning across TBox and ABox

Efficient long-distance relation extraction with DG-SpanBERT

Prediction of Metabolic Pathways Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

Aber-OWL: a framework for ontology-based data access in biology

Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases

Towards quantitative measures in applied ontology