Researcher profile

Jens Dörpinghaus

Jens Dörpinghaus contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Noise-Aware Named Entity Recognition for Historical VET Documents

This paper addresses Named Entity Recognition (NER) in the domain of Vocational Education and Training (VET), focusing on historical, digitized documents that suffer from OCR-induced noise. We propose a robust NER approach leveraging Noise-Aware Training (NAT) with synthetically injected OCR errors, transfer learning, and multi-stage fine-tuning. Three complementary strategies, training on noisy, clean, and artificial data, are systematically compared. Our method is one of the first to recognize multiple entity types in VET documents. It is applied to German documents but transferable to arbitrary languages. Experimental results demonstrate that domain-specific and noise-aware fine-tuning substantially increases robustness and accuracy under noisy conditions. We provide publicly available code for reproducible noise-aware NER in domain-specific contexts.

preprint2022arXiv

Centrality Measures in multi-layer Knowledge Graphs

Knowledge graphs play a central role for linking different data which leads to multiple layers. Thus, they are widely used in big data integration, especially for connecting data from different domains. Few studies have investigated the questions how multiple layers within graphs impact methods and algorithms developed for single-purpose networks, for example social networks. This manuscript investigates the impact of multiple layers on centrality measures compared to single-purpose graph. In particular, (a) we develop an experimental environment to (b) evaluate two different centrality measures - degree and betweenness centrality - on random graphs inspired by social network analysis: small-world and scale-free networks. The presented approach (c) shows that the graph structures and topology has a great impact on its robustness for additional data stored. Although the experimental analysis of random graphs allows us to make some basic observations we will (d) make suggestions for additional research on particular graph structures that have a great impact on the stability of networks.

preprint2020arXiv

Towards context in large scale biomedical knowledge graphs

Contextual information is widely considered for NLP and knowledge discovery in life sciences since it highly influences the exact meaning of natural language. The scientific challenge is not only to extract such context data, but also to store this data for further query and discovery approaches. Here, we propose a multiple step knowledge graph approach using labeled property graphs based on polyglot persistence systems to utilize context data for context mining, graph queries, knowledge discovery and extraction. We introduce the graph-theoretic foundation for a general context concept within semantic networks and show a proof-of-concept based on biomedical literature and text mining. Our test system contains a knowledge graph derived from the entirety of PubMed and SCAIView data and is enriched with text mining data and domain specific language data using BEL. Here, context is a more general concept than annotations. This dense graph has more than 71M nodes and 850M relationships. We discuss the impact of this novel approach with 27 real world use cases represented by graph queries.