Researcher profile

Ingo Scholtes

Ingo Scholtes contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Learning Neural Operator Surrogates for the Black Hole Accretion Code

General-relativistic magnetohydrodynamic (GR-MHD) simulations are essential for studying black hole accretion, relativistic jets, and magnetic reconnection, yet their computational cost severely limits systematic parameter exploration. We investigate neural operator surrogates for two astrophysically relevant simulation scenarios produced by the Black Hole Accretion Code (\texttt{BHAC}). First, a Physics Informed Fourier Neural Operator (PINO) is trained on the special-relativistic resistive MHD (SRRMHD) evolution of the Orszag-Tang vortex over a range of resistivities spanning the Sweet-Parker and fast reconnection regimes. By embedding the governing equations as an additional loss term evaluated at finer temporal resolution than the available data supervision, the model learns dynamics at time steps where no simulation data is provided, enabling recovery of plasmoid formation that a data-only baseline trained on the same sparse snapshots fails to reproduce. To our knowledge, the present work is the first application of a physics informed neural operator to special relativistic resistive MHD, and the first to investigate the capability of such models to resolve plasmoid formation in SRRMHD. In a second line of investigation, an OFormer-style Transformer Neural Operator is trained on the evolution of spine-sheath relativistic jets created with \texttt{BHAC}, in special-relativistic MHD (SRMHD). The model is directly applied on the adaptive mesh, highlighting the need for linear attention due to long sequences. The neural surrogate model is capable of capturing most of the major details, especially in early predictions. To our knowledge, this constitutes the first application of a neural operator directly on a high resolution adaptive mesh refinement grid in the context of MHD simulations.

preprint2026arXiv

The Role of Node Features in Graph Pooling

Graph pooling is commonly applied in graph classification, yet its empirical gains over standard WL-1 expressive GNNs are often marginal or inconsistent. We study this gap by analysing the interaction between node features and graph topology and their effect on pooling objectives. Our analysis reveals that pooling operators require node features that are well-aligned with the graph's topology -- a condition often overlooked and not guaranteed in empirical networks. We formalise fundamental requirements for node features to enable effective pooling, and introduce a quantitative measure of feature quality. Our empirical evaluation shows that, when these requirements are satisfied, pooling can be beneficial and improve performance on appropriate datasets.

preprint2022arXiv

Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set

Massive data from software repositories and collaboration tools are widely used to study social aspects in software development. One question that several recent works have addressed is how a software project's size and structure influence team productivity, a question famously considered in Brooks' law. Recent studies using massive repository data suggest that developers in larger teams tend to be less productive than smaller teams. Despite using similar methods and data, other studies argue for a positive linear or even super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale. In our work, we study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity. Our work contributes to the ongoing discussion on the choice of productivity metrics in the operationalisation of hypotheses about determinants of successful software projects. It further highlights general pitfalls in big data analysis and shows that the use of bigger data sets does not automatically lead to more reliable insights.

preprint2022arXiv

Sequential Motifs in Observed Walks

The structure of complex networks can be characterized by counting and analyzing network motifs. Motifs are small subgraphs that occur repeatedly in a network, such as triangles or chains. Recent work has generalized motifs to temporal and dynamic network data. However, existing techniques do not generalize to sequential or trajectory data, which represents entities moving through the nodes of a network, such as passengers moving through transportation networks. The unit of observation in these data is fundamentally different, since we analyze full observations of trajectories (e.g., a trip from airport A to airport C through airport B), rather than independent observations of edges or snapshots of graphs over time. In this work, we define sequential motifs in trajectory data, which are small, directed, and edge-weighted subgraphs corresponding to patterns in observed sequences. We draw a connection between counting and analysis of sequential motifs and Higher-Order Network (HON) models. We show that by mapping edges of a HON, specifically a $k$th-order DeBruijn graph, to sequential motifs, we can count and evaluate their importance in observed data. We test our methodology with two datasets: (1) passengers navigating an airport network and (2) people navigating the Wikipedia article network. We find that the most prevalent and important sequential motifs correspond to intuitive patterns of traversal in the real systems, and show empirically that the heterogeneity of edge weights in an observed higher-order DeBruijn graph has implications for the distributions of sequential motifs we expect to see across our null models.

preprint2020arXiv

HOTVis: Higher-Order Time-Aware Visualisation of Dynamic Graphs

Network visualisation techniques are important tools for the exploratory analysis of complex systems. While these methods are regularly applied to visualise data on complex networks, we increasingly have access to time series data that can be modelled as temporal networks or dynamic graphs. In dynamic graphs, the temporal ordering of time-stamped edges determines the causal topology of a system, i.e., which nodes can, directly and indirectly, influence each other via a so-called causal path. This causal topology is crucial to understand dynamical processes, assess the role of nodes, or detect clusters. However, we lack graph drawing techniques that incorporate this information into static visualisations. Addressing this gap, we present a novel dynamic graph visualisation algorithm that utilises higher-order graphical models of causal paths in time series data to compute time-aware static graph visualisations. These visualisations combine the simplicity and interpretability of static graphs with a time-aware layout algorithm that highlights patterns in the causal topology that result from the temporal dynamics of edges.

preprint2020arXiv

HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.

preprint2020arXiv

Learning the Markov order of paths in a network

We study the problem of learning the Markov order in categorical sequences that represent paths in a network, i.e. sequences of variable lengths where transitions between states are constrained to a known graph. Such data pose challenges for standard Markov order detection methods and demand modelling techniques that explicitly account for the graph constraint. Adopting a multi-order modelling framework for paths, we develop a Bayesian learning technique that (i) more reliably detects the correct Markov order compared to a competing method based on the likelihood ratio test, (ii) requires considerably less data compared to methods using AIC or BIC, and (iii) is robust against partial knowledge of the underlying constraints. We further show that a recently published method that uses a likelihood ratio test has a tendency to overfit the true Markov order of paths, which is not the case for our Bayesian technique. Our method is important for data scientists analyzing patterns in categorical sequence data that are subject to (partially) known constraints, e.g. sequences with forbidden words, mobility trajectories and click stream data, or sequence data in bioinformatics. Addressing the key challenge of model selection, our work is further relevant for the growing body of research that emphasizes the need for higher-order models in network analysis.

preprint2019arXiv

Counting Causal Paths in Big Times Series Data on Networks

Graph or network representations are an important foundation for data mining and machine learning tasks in relational data. Many tools of network analysis, like centrality measures, information ranking, or cluster detection rest on the assumption that links capture direct influence, and that paths represent possible indirect influence. This assumption is invalidated in time-stamped network data capturing, e.g., dynamic social networks, biological sequences or financial transactions. In such data, for two time-stamped links (A,B) and (B,C) the chronological ordering and timing determines whether a causal path from node A via B to C exists. A number of works has shown that for that reason network analysis cannot be directly applied to time-stamped network data. Existing methods to address this issue require statistics on causal paths, which is computationally challenging for big data sets. Addressing this problem, we develop an efficient algorithm to count causal paths in time-stamped network data. Applying it to empirical data, we show that our method is more efficient than a baseline method implemented in an OpenSource data analytics package. Our method works efficiently for different values of the maximum time difference between consecutive links of a causal path and supports streaming scenarios. With it, we are closing a gap that hinders an efficient analysis of big time series data on complex networks.

preprint2017arXiv

From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.