Researcher profile

Stefan Dietze

Stefan Dietze contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

nuScenes Knowledge Graph -- A comprehensive semantic representation of traffic scenes for trajectory prediction

Trajectory prediction in traffic scenes involves accurately forecasting the behaviour of surrounding vehicles. To achieve this objective it is crucial to consider contextual information, including the driving path of vehicles, road topology, lane dividers, and traffic rules. Although studies demonstrated the potential of leveraging heterogeneous context for improving trajectory prediction, state-of-the-art deep learning approaches still rely on a limited subset of this information. This is mainly due to the limited availability of comprehensive representations. This paper presents an approach that utilizes knowledge graphs to model the diverse entities and their semantic connections within traffic scenes. Further, we present nuScenes Knowledge Graph (nSKG), a knowledge graph for the nuScenes dataset, that models explicitly all scene participants and road elements, as well as their semantic and spatial relationships. To facilitate the usage of the nSKG via graph neural networks for trajectory prediction, we provide the data in a format, ready-to-use by the PyG library. All artefacts can be found here: https://github.com/boschresearch/nuScenes_Knowledge_Graph

preprint2022arXiv

SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search

The emerging research field Search as Learning investigates how the Web facilitates learning through modern information retrieval systems. SAL research requires significant amounts of data that capture both search behavior of users and their acquired knowledge in order to obtain conclusive insights or train supervised machine learning models. However, the creation of such datasets is costly and requires interdisciplinary efforts in order to design studies and capture a wide range of features. In this paper, we address this issue and introduce an extensive dataset based on a user study, in which $114$ participants were asked to learn about the formation of lightning and thunder. Participants' knowledge states were measured before and after Web search through multiple-choice questionnaires and essay-based free recall tasks. To enable future research in SAL-related tasks we recorded a plethora of features and person-related attributes. Besides the screen recordings, visited Web pages, and detailed browsing histories, a large number of behavioral features and resource features were monitored. We underline the usefulness of the dataset by describing three, already published, use cases.

preprint2022arXiv

SciTweets -- A Dataset and Annotation Framework for Detecting Scientific Online Discourse

Scientific topics, claims and resources are increasingly debated as part of online discourse, where prominent examples include discourse related to COVID-19 or climate change. This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines. For instance, communication studies aim at a deeper understanding of biases, quality or spreading pattern of scientific information whereas computational methods have been proposed to extract, classify or verify scientific claims using NLP and IR techniques. However, research across disciplines currently suffers from both a lack of robust definitions of the various forms of science-relatedness as well as appropriate ground truth data for distinguishing them. In this work, we contribute (a) an annotation framework and corresponding definitions for different forms of scientific relatedness of online discourse in Tweets, (b) an expert-annotated dataset of 1261 tweets obtained through our labeling framework reaching an average Fleiss Kappa $κ$ of 0.63, (c) a multi-label classifier trained on our data able to detect science-relatedness with 89% F1 and also able to detect distinct forms of scientific knowledge (claims, references). With this work we aim to lay the foundation for developing and evaluating robust methods for analysing science as part of large-scale online discourse.

preprint2022arXiv

Still Haven't Found What You're Looking For -- Detecting the Intent of Web Search Missions from User Interaction Features

Web search is among the most frequent online activities. Whereas traditional information retrieval techniques focus on the information need behind a user query, previous work has shown that user behaviour and interaction can provide important signals for understanding the underlying intent of a search mission. An established taxonomy distinguishes between transactional, navigational and informational search missions, where in particular the latter involve a learning goal, i.e. the intent to acquire knowledge about a particular topic. We introduce a supervised approach for classifying online search missions into either of these categories by utilising a range of features obtained from the user interactions during an online search mission. Applying our model to a dataset of real-world query logs, we show that search missions can be categorised with an average F1 score of 63% and accuracy of 69%, while performance on informational and navigational missions is particularly promising (F1>75%). This suggests the potential to utilise such supervised classification during online search to better facilitate retrieval and ranking as well as to improve affiliated services, such as targeted online ads.

preprint2022arXiv

The many facets of academic mobility and its impact on scholars' career

International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. The aim of this study is twofold. First, we investigate to what extent individual factors associate with the mobility of researchers. Second, we explore the relationship between mobility and scientific activity and impact. For this purpose, we used a bibliometric approach to track the mobility of authors. To compare the scientific outcomes of researchers, we considered the number of publications and received citations as indicators, as well as the number of unique co-authors in all their publications. We also analysed the co-authorship network of researchers and compared centrality measures of mobile and non-mobile researchers. Results show that researchers from North America and Sub-Saharan Africa, particularly female ones, have the lowest, respectively, highest tendency towards international mobility. Having international co-authors increases the probability of international movement. Our findings uncover gender inequality in international mobility across scientific fields and countries. Across genders, researchers in the Physical sciences have the most and in the Social sciences the least rate of mobility. We observed more mobility for Social scientists at the advanced career stage, while researchers in other fields prefer to move at earlier career stages. Also, we found a positive correlation between mobility and scientific outcomes, but no apparent difference between females and males. Comparing the centrality of mobile and non-mobile researchers in the co-authorship networks reveals a higher social capital advantage for mobile researchers.

preprint2021arXiv

Better Together -- An Ensemble Learner for Combining the Results of Ready-made Entity Linking Systems

Entity linking (EL) is the task of automatically identifying entity mentions in text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. Throughout the past decade, a plethora of EL systems and pipelines have become available, where performance of individual systems varies heavily across corpora, languages or domains. Linking performance varies even between different mentions in the same text corpus, where, for instance, some EL approaches are better able to deal with short surface forms while others may perform better when more context information is available. To this end, we argue that performance may be optimised by exploiting results from distinct EL systems on the same corpus, thereby leveraging their individual strengths on a per-mention basis. In this paper, we introduce a supervised approach which exploits the output of multiple ready-made EL systems by predicting the correct link on a per-mention basis. Experimental results obtained on existing ground truth datasets and exploiting three state-of-the-art EL systems show the effectiveness of our approach and its capacity to significantly outperform the individual EL systems as well as a set of baseline methods.

preprint2020arXiv

The Role of Word-Eye-Fixations for Query Term Prediction

Throughout the search process, the user's gaze on inspected SERPs and websites can reveal his or her search interests. Gaze behavior can be captured with eye tracking and described with word-eye-fixations. Word-eye-fixations contain the user's accumulated gaze fixation duration on each individual word of a web page. In this work, we analyze the role of word-eye-fixations for predicting query terms. We investigate the relationship between a range of in-session features, in particular, gaze data, with the query terms and train models for predicting query terms. We use a dataset of 50 search sessions obtained through a lab study in the social sciences domain. Using established machine learning models, we can predict query terms with comparably high accuracy, even with only little training data. Feature analysis shows that the categories Fixation, Query Relevance and Session Topic contain the most effective features for our task.

preprint2020arXiv

TweetsCOV19 -- A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic

Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigating measures and their societal impact. Understanding such discourse, its evolution, and interdependencies with real-world events or (mis)information can foster valuable insights. On the other hand, such corpora are crucial facilitators for computational methods addressing tasks such as sentiment analysis, event detection, or entity recognition. However, obtaining, archiving, and semantically annotating large amounts of tweets is costly. In this paper, we describe TweetsCOV19, a publicly available knowledge base of currently more than 8 million tweets, spanning October 2019 - April 2020. Metadata about the tweets as well as extracted entities, hashtags, user mentions, sentiments, and URLs are exposed using established RDF/S vocabularies, providing an unprecedented knowledge base for a range of knowledge discovery tasks. Next to a description of the dataset and its extraction and annotation process, we present an initial analysis and use cases of the corpus.