Source author record

Steffen Staab

Steffen Staab appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language cs.CY Machine Learning Databases Computer Vision Human-Computer Interaction Social and Information Networks cond-mat.stat-mech Distributed, Parallel, and Cluster Computing physics.soc-ph Programming Languages Robotics

Catalog footprint

What is connected

16works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FutureSim: Replaying World Events to Evaluate Adaptive Agents

AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We build FutureSim, where agents forecast world events beyond their knowledge cutoff while interacting with a chronological replay of the world: real news articles arriving and questions resolving over the simulated period. We evaluate frontier agents in their native harness, testing their ability to predict world events over a three-month period from January to March 2026. FutureSim reveals a clear separation in their capabilities, with the best agent's accuracy being 25%, and many having worse Brier skill score than making no prediction at all. Through careful ablations, we show how FutureSim offers a realistic setting to study emerging research directions like long-horizon test-time adaptation, search, memory, and reasoning about uncertainty. Overall, we hope our benchmark design paves the way to measure AI progress on open-ended adaptation spanning long time-horizons in the real world.

preprint2026arXiv

Leveraging Graph Structure in Seq2Seq Models for Knowledge Graph Link Prediction

We introduce Graph-Augmented Sequence-to-Sequence (GA-S2S), a novel framework that integrates a T5-small encoder-decoder with a Relational Graph Attention Network (RGAT) to improve link prediction in knowledge graphs. While existing Seq2Seq models rely solely on surface-level textual descriptions of entities and relations and at best, flatten the neighborhoods of a query entity into a single linear sequence, thereby discarding the inherent graph structure, GA-S2S jointly encodes both textual features and the full $k$-hop subgraph topology surrounding the query entity. By integrating raw encoder outputs with RGAT's relation-aware embeddings, our model captures and leverages richer multi-hop relational patterns and textual information. Our preliminary experiments on the CoDEx dataset demonstrate that GA-S2S outperforms competitive Seq2Seq-based baseline models, achieving up to a 19\% relative gain in link prediction accuracy.

preprint2026arXiv

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63). These results suggest that lightweight hybrid LM-GNN architectures offer a promising and resource-efficient path towards foundation models for relational databases.

preprint2026arXiv

What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete Knowledge

Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs. However, current evaluation practices fall short: existing benchmarks often include questions that can be directly answered using existing triples in KG, making it unclear whether models perform reasoning or simply retrieve answers directly. Moreover, inconsistent evaluation metrics and lenient answer matching criteria further obscure meaningful comparisons. In this work, we introduce a general method for constructing benchmarks and present BRINK (Benchmark for Reasoning under Incomplete Knowledge) to systematically assess KG-RAG methods under knowledge incompleteness. Our empirical results show that current KG-RAG methods have limited reasoning ability under missing knowledge, often rely on internal memorization, and exhibit varying degrees of generalization depending on their design.

preprint2023arXiv

Predicting Eye Gaze Location on Websites

World-wide-web, with the website and webpage as the main interface, facilitates the dissemination of important information. Hence it is crucial to optimize them for better user interaction, which is primarily done by analyzing users' behavior, especially users' eye-gaze locations. However, gathering these data is still considered to be labor and time intensive. In this work, we enable the development of automatic eye-gaze estimations given a website screenshots as the input. This is done by the curation of a unified dataset that consists of website screenshots, eye-gaze heatmap and website's layout information in the form of image and text masks. Our pre-processed dataset allows us to propose an effective deep learning-based model that leverages both image and text spatial location, which is combined through attention mechanism for effective eye-gaze prediction. In our experiment, we show the benefit of careful fine-tuning using our unified dataset to improve the accuracy of eye-gaze predictions. We further observe the capability of our model to focus on the targeted areas (images and text) to achieve high accuracy. Finally, the comparison with other alternatives shows the state-of-the-art result of our model establishing the benchmark for the eye-gaze prediction task.

preprint2023arXiv

SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural Networks

Understanding traffic scenes requires considering heterogeneous information about dynamic agents and the static infrastructure. In this work we propose SCENE, a methodology to encode diverse traffic scenes in heterogeneous graphs and to reason about these graphs using a heterogeneous Graph Neural Network encoder and task-specific decoders. The heterogeneous graphs, whose structures are defined by an ontology, consist of different nodes with type-specific node features and different relations with type-specific edge features. In order to exploit all the information given by these graphs, we propose to use cascaded layers of graph convolution. The result is an encoding of the scene. Task-specific decoders can be applied to predict desired attributes of the scene. Extensive evaluation on two diverse binary node classification tasks show the main strength of this methodology: despite being generic, it even manages to outperform task-specific baselines. The further application of our methodology to the task of node classification in various knowledge graphs shows its transferability to other domains.

preprint2022arXiv

Formalizing Cost Fairness for Two-Party Exchange Protocols using Game Theory and Applications to Blockchain (Extended Version)

Existing fair exchange protocols usually neglect consideration of cost when assessing their fairness. However, in an environment with non-negligible transaction cost, e.g., public blockchains, high or unexpected transaction cost might be an obstacle for wide-spread adoption of fair exchange protocols in business applications. For example, as of 2021-12-17, the initialization of the FairSwap protocol on the Ethereum blockchain requires the selling party to pay a fee of approx. 349.20 USD per exchange. We address this issue by defining cost fairness, which can be used to assess two-party exchange protocols including implied transaction cost. We show that in an environment with non-negligible transaction cost where one party has to initialize the exchange protocol and the other party can leave the exchange at any time cost fairness cannot be achieved.

preprint2022arXiv

Ultrahyperbolic Knowledge Graph Embeddings

Recent knowledge graph (KG) embeddings have been advanced by hyperbolic geometry due to its superior capability for representing hierarchies. The topological structures of real-world KGs, however, are rather heterogeneous, i.e., a KG is composed of multiple distinct hierarchies and non-hierarchical graph structures. Therefore, a homogeneous (either Euclidean or hyperbolic) geometry is not sufficient for fairly representing such heterogeneous structures. To capture the topological heterogeneity of KGs, we present an ultrahyperbolic KG embedding (UltraE) in an ultrahyperbolic (or pseudo-Riemannian) manifold that seamlessly interleaves hyperbolic and spherical manifolds. In particular, we model each relation as a pseudo-orthogonal transformation that preserves the pseudo-Riemannian bilinear form. The pseudo-orthogonal transformation is decomposed into various operators (i.e., circular rotations, reflections and hyperbolic rotations), allowing for simultaneously modeling heterogeneous structures as well as complex relational patterns. Experimental results on three standard KGs show that UltraE outperforms previous Euclidean- and hyperbolic-based approaches.

preprint2022arXiv

User Interaction Analysis through Contrasting Websites Experience

Current advance of internet allows rapid dissemination of information, accelerating the progress on wide spectrum of society. This has been done mainly through the use of website interface with inherent unique human interactions. In this regards the usability analysis becomes a central part to improve the human interactions. However, This analysis has not yet quantitatively been evaluated through user perception during interaction, especially when dealing wide range of tasks. In this study, we perform the quantitative analysis the usability of websites based on their usage and relevance. We do this by reporting user interactions based user subjective perceptions, eye-tracking data and facial expressions based on the collected data from two different sets of websites. In general, we found that the user interaction parameters are substantially difference across website sets, with a degree of relation with perceived user emotions during interactions.

preprint2020arXiv

Bias in Data-driven AI Systems -- An Introductory Survey

AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multi-disciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful Machine Learning (ML) algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features like race, sex, etc.

preprint2020arXiv

GeoSPARQL+: Syntax, Semantics and System for Integrated Querying of Graph, Raster and Vector Data -- Technical Report

We introduce an approach to semantically represent and query raster data in a Semantic Web graph. We extend the GeoSPARQL vocabulary and query language to support raster data as a new type of geospatial data. We define new filter functions and illustrate our approach using several use cases on real-world data sets. Finally, we describe a prototypical implementation and validate the feasibility of our approach.

preprint2020arXiv

Time-invariant degree growth in preferential attachment network models

Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real networks. Motivated by the observation that the average degree growth of nodes is time-invariant in empirical network data, we study the degree dynamics in the relevant class of network models where preferential attachment is combined with heterogeneous node fitness and aging. We propose a novel analytical framework based on the time-invariance of the studied systems and show that it is self-consistent only for two special network growth forms: the uniform and exponential network growth. Conversely, the breaking of such time-invariance explains the winner-takes-all effect in some model settings, revealing the connection between the Bose-Einstein condensation in the Bianconi-Barabási model and similar gelation in superlinear preferential attachment. Aging is necessary to reproduce realistic node degree growth curves and can prevent the winner-takes-all effect under weak conditions. Our results are verified by extensive numerical simulations.

preprint2016arXiv

LambdaDL: Syntax and Semantics (Preliminary Report)

Semantic data fuels many different applications, but is still lacking proper integration into programming languages. Untyped access is error-prone while mapping approaches cannot fully capture the conceptualization of semantic data. In this paper, we present $λ_{DL}$,a $λ$-calculus with a modified type system to provide type-safe integration of semantic data. This is achieved by the integration of description logics into the $λ$-calculus for typing and data access. It is centered around several key design principles. Among these are (1) the usage of semantic conceptualizations as types, (2) subtype inference for these types, and (3) type-checked query access to the data by both ensuring the satisfiability of queries as well as typing query results precisely in $λ_{DL}$. The paper motivates the use of a modified type system for semantic data and it provides the theoretic foundation for the integration of description logics as well as the core formal specifications of $λ_{DL}$ including a proof of type safety.

preprint2016arXiv

Observing and Recommending from a Social Web with Biases

The research question this report addresses is: how, and to what extent, those directly involved with the design, development and employment of a specific black box algorithm can be certain that it is not unlawfully discriminating (directly and/or indirectly) against particular persons with protected characteristics (e.g. gender, race and ethnicity)?

preprint2015arXiv

Voting Behaviour and Power in Online Democracy: A Study of LiquidFeedback in Germany's Pirate Party

In recent years, political parties have adopted Online Delegative Democracy platforms such as LiquidFeedback to organise themselves and their political agendas via a grassroots approach. A common objection against the use of these platforms is the delegation system, where a user can delegate his vote to another user, giving rise to so-called super-voters, i.e. powerful users who receive many delegations. It has been asserted in the past that the presence of these super-voters undermines the democratic process, and therefore delegative democracy should be avoided. In this paper, we look at the emergence of super-voters in the largest delegative online democracy platform worldwide, operated by Germany's Pirate Party. We investigate the distribution of power within the party systematically, study whether super-voters exist, and explore the influence they have on the outcome of votings conducted online. While we find that the theoretical power of super-voters is indeed high, we also observe that they use their power wisely. Super-voters do not fully act on their power to change the outcome of votes, but they vote in favour of proposals with the majority of voters in many cases thereby exhibiting a stabilising effect on the system. We use these findings to present a novel class of power indices that considers observed voting biases and gives significantly better predictions than state-of-the-art measures.

preprint2014arXiv

A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser-Ney Smoothing

We introduce a novel approach for building language models based on a systematic, recursive exploration of skip n-gram models which are interpolated using modified Kneser-Ney smoothing. Our approach generalizes language models as it contains the classical interpolation with lower order models as a special case. In this paper we motivate, formalize and present our approach. In an extensive empirical experiment over English text corpora we demonstrate that our generalized language models lead to a substantial reduction of perplexity between 3.1% and 12.7% in comparison to traditional language models using modified Kneser-Ney smoothing. Furthermore, we investigate the behaviour over three other languages and a domain specific corpus where we observed consistent improvements. Finally, we also show that the strength of our approach lies in its ability to cope in particular with sparse training data. Using a very small training data set of only 736 KB text we yield improvements of even 25.7% reduction of perplexity.

Steffen Staab

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

FutureSim: Replaying World Events to Evaluate Adaptive Agents

Leveraging Graph Structure in Seq2Seq Models for Knowledge Graph Link Prediction

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete Knowledge

Predicting Eye Gaze Location on Websites

SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural Networks

Formalizing Cost Fairness for Two-Party Exchange Protocols using Game Theory and Applications to Blockchain (Extended Version)

Ultrahyperbolic Knowledge Graph Embeddings

User Interaction Analysis through Contrasting Websites Experience

Bias in Data-driven AI Systems -- An Introductory Survey

GeoSPARQL+: Syntax, Semantics and System for Integrated Querying of Graph, Raster and Vector Data -- Technical Report

Time-invariant degree growth in preferential attachment network models

LambdaDL: Syntax and Semantics (Preliminary Report)

Observing and Recommending from a Social Web with Biases

Voting Behaviour and Power in Online Democracy: A Study of LiquidFeedback in Germany's Pirate Party

A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser-Ney Smoothing