Source author record

Jens Dörpinghaus

Jens Dörpinghaus appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Data Structures and Algorithms Databases Information Retrieval Machine Learning Social and Information Networks

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Noise-Aware Named Entity Recognition for Historical VET Documents

This paper addresses Named Entity Recognition (NER) in the domain of Vocational Education and Training (VET), focusing on historical, digitized documents that suffer from OCR-induced noise. We propose a robust NER approach leveraging Noise-Aware Training (NAT) with synthetically injected OCR errors, transfer learning, and multi-stage fine-tuning. Three complementary strategies, training on noisy, clean, and artificial data, are systematically compared. Our method is one of the first to recognize multiple entity types in VET documents. It is applied to German documents but transferable to arbitrary languages. Experimental results demonstrate that domain-specific and noise-aware fine-tuning substantially increases robustness and accuracy under noisy conditions. We provide publicly available code for reproducible noise-aware NER in domain-specific contexts.

preprint2022arXiv

Centrality Measures in multi-layer Knowledge Graphs

Knowledge graphs play a central role for linking different data which leads to multiple layers. Thus, they are widely used in big data integration, especially for connecting data from different domains. Few studies have investigated the questions how multiple layers within graphs impact methods and algorithms developed for single-purpose networks, for example social networks. This manuscript investigates the impact of multiple layers on centrality measures compared to single-purpose graph. In particular, (a) we develop an experimental environment to (b) evaluate two different centrality measures - degree and betweenness centrality - on random graphs inspired by social network analysis: small-world and scale-free networks. The presented approach (c) shows that the graph structures and topology has a great impact on its robustness for additional data stored. Although the experimental analysis of random graphs allows us to make some basic observations we will (d) make suggestions for additional research on particular graph structures that have a great impact on the stability of networks.

preprint2020arXiv

Towards context in large scale biomedical knowledge graphs

Contextual information is widely considered for NLP and knowledge discovery in life sciences since it highly influences the exact meaning of natural language. The scientific challenge is not only to extract such context data, but also to store this data for further query and discovery approaches. Here, we propose a multiple step knowledge graph approach using labeled property graphs based on polyglot persistence systems to utilize context data for context mining, graph queries, knowledge discovery and extraction. We introduce the graph-theoretic foundation for a general context concept within semantic networks and show a proof-of-concept based on biomedical literature and text mining. Our test system contains a knowledge graph derived from the entirety of PubMed and SCAIView data and is enriched with text mining data and domain specific language data using BEL. Here, context is a more general concept than annotations. This dense graph has more than 71M nodes and 850M relationships. We discuss the impact of this novel approach with 27 real world use cases represented by graph queries.