Source author record

Thomas Gottron

Thomas Gottron appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Information Retrieval Machine Learning Applications Artificial Intelligence Computation Computation and Language Computational Engineering, Finance, and Science cs.CY q-fin.ST Social and Information Networks

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Network topology of the Euro Area interbank market

The rapidly increasing availability of large amounts of granular financial data, paired with the advances of big data related technologies induces the need of suitable analytics that can represent and extract meaningful information from such data. In this paper we propose a multi-layer network approach to distill the Euro Area (EA) banking system in different distinct layers. Each layer of the network represents a specific type of financial relationship between banks, based on various sources of EA granular data collections. The resulting multi-layer network allows one to describe, analyze and compare the topology and structure of EA banks from different perspectives, eventually yielding a more complete picture of the financial market. This granular information representation has the potential to enable researchers and practitioners to better apprehend financial system dynamics as well as to support financial policies to manage and monitor financial risk from a more holistic point of view.

preprint2022arXiv

Desiderata for Explainable AI in statistical production systems of the European Central Bank

Explainable AI constitutes a fundamental step towards establishing fairness and addressing bias in algorithmic decision-making. Despite the large body of work on the topic, the benefit of solutions is mostly evaluated from a conceptual or theoretical point of view and the usefulness for real-world use cases remains uncertain. In this work, we aim to state clear user-centric desiderata for explainable AI reflecting common explainability needs experienced in statistical production systems of the European Central Bank. We link the desiderata to archetypical user roles and give examples of techniques and methods which can be used to address the user's needs. To this end, we provide two concrete use cases from the domain of statistical data production in central banks: the detection of outliers in the Centralised Securities Database and the data-driven identification of data quality checks for the Supervisory Banking data system.

preprint2022arXiv

Introducing explainable supervised machine learning into interactive feedback loops for statistical production system

Statistical production systems cover multiple steps from the collection, aggregation, and integration of data to tasks like data quality assurance and dissemination. While the context of data quality assurance is one of the most promising fields for applying machine learning, the lack of curated and labeled training data is often a limiting factor. The statistical production system for the Centralised Securities Database features an interactive feedback loop between data collected by the European Central Bank and data quality assurance performed by data quality managers at National Central Banks. The quality assurance feedback loop is based on a set of rule-based checks for raising exceptions, upon which the user either confirms the data or corrects an actual error. In this paper we use the information received from this feedback loop to optimize the exceptions presented to the National Central Banks thereby improving the quality of exceptions generated and the time consumed on the system by the users authenticating those exceptions. For this approach we make use of explainable supervised machine learning to (a) identify the types of exceptions and (b) to prioritize which exceptions are more likely to require an intervention or correction by the NCBs. Furthermore, we provide an explainable AI taxonomy aiming to identify the different explainable AI needs that arose during the project.

preprint2016arXiv

Measuring the Accuracy of Linked Data Indices

Being based on Web technologies, Linked Data is distributed and decentralised in its nature. Hence, for the purpose of finding relevant Linked Data on the Web, search indices play an important role. Also for avoiding network communication overhead and latency, applications rely on indices or caches over Linked Data. These indices and caches are based on local copies of the original data and, thereby, introduce redundancy. Furthermore, as changes at the original Linked Data sources are not automatically propagated to the local copies, there is a risk of having inaccurate indices and caches due to outdated information. In this paper I discuss and compare methods for measuring the accuracy of indices. I will present different measures which have been used in related work and evaluate their advantages and disadvantages from a theoretic point of view as well as from a practical point of view by analysing their behaviour on real world data in an empirical experiment.

preprint2016arXiv

TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud - An Extended Technical Report

Deciding which vocabulary terms to use when modeling data as Linked Open Data (LOD) is far from trivial. Choosing too general vocabulary terms, or terms from vocabularies that are not used by other LOD datasets, is likely to lead to a data representation, which will be harder to understand by humans and to be consumed by Linked data applications. In this technical report, we propose TermPicker: a novel approach for vocabulary reuse by recommending RDF types and properties based on exploiting the information on how other data providers on the LOD cloud use RDF types and properties to describe their data. To this end, we introduce the notion of so-called schema-level patterns (SLPs). They capture how sets of RDF types are connected via sets of properties within some data collection, e.g., within a dataset on the LOD cloud. TermPicker uses such SLPs and generates a ranked list of vocabulary terms for reuse. The lists of recommended terms are ordered by a ranking model which is computed using the machine learning approach Learning To Rank (L2R). TermPicker is evaluated based on the recommendation quality that is measured using the Mean Average Precision (MAP) and the Mean Reciprocal Rank at the first five positions (MRR@5). Our results illustrate an improvement of the recommendation quality by 29% - 36% when using SLPs compared to the beforehand investigated baselines of recommending solely popular vocabulary terms or terms from the same vocabulary. The overall best results are achieved using SLPs in conjunction with the Learning To Rank algorithm Random Forests.

preprint2016arXiv

The Impact of the Filter Bubble -- A Simulation Based Framework for Measuring Personalisation Macro Effects in Online Communities

The term filter bubble has been coined to describe the situation of online users which---due to filtering algorithms---live in a personalised information universe biased towards their own interests.In this paper we use an agent-based simulation framework to measure the actual risk and impact of filter bubble effects occurring in online communities due to content or author based personalisation algorithms. Observing the strength of filter bubble effects allows for opposing the benefits to the risks of personalisation.In our simulation we observed, that filter bubble effects occur as soon as users indicate preferences towards certain topics.We also saw, that well connected users are affected much stronger than average or poorly connected users. Finally, our experimental setting indicated that the employed personalisation algorithm based on content features seems to bear a lower risk of filter bubble effects than one performing personalisation based on authors.

preprint2015arXiv

Web Content Extraction - a Meta-Analysis of its Past and Thoughts on its Future

In this paper, we present a meta-analysis of several Web content extraction algorithms, and make recommendations for the future of content extraction on the Web. First, we find that nearly all Web content extractors do not consider a very large, and growing, portion of modern Web pages. Second, it is well understood that wrapper induction extractors tend to break as the Web changes; heuristic/feature engineering extractors were thought to be immune to a Web site's evolution, but we find that this is not the case: heuristic content extractor performance also tends to degrade over time due to the evolution of Web site forms and practices. We conclude with recommendations for future work that address these and other findings.

preprint2014arXiv

A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser-Ney Smoothing

We introduce a novel approach for building language models based on a systematic, recursive exploration of skip n-gram models which are interpolated using modified Kneser-Ney smoothing. Our approach generalizes language models as it contains the classical interpolation with lower order models as a special case. In this paper we motivate, formalize and present our approach. In an extensive empirical experiment over English text corpora we demonstrate that our generalized language models lead to a substantial reduction of perplexity between 3.1% and 12.7% in comparison to traditional language models using modified Kneser-Ney smoothing. Furthermore, we investigate the behaviour over three other languages and a domain specific corpus where we observed consistent improvements. Finally, we also show that the strength of our approach lies in its ability to cope in particular with sparse training data. Using a very small training data set of only 736 KB text we yield improvements of even 25.7% reduction of perplexity.

Thomas Gottron

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Network topology of the Euro Area interbank market

Desiderata for Explainable AI in statistical production systems of the European Central Bank

Introducing explainable supervised machine learning into interactive feedback loops for statistical production system

Measuring the Accuracy of Linked Data Indices

TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud - An Extended Technical Report

The Impact of the Filter Bubble -- A Simulation Based Framework for Measuring Personalisation Macro Effects in Online Communities

Web Content Extraction - a Meta-Analysis of its Past and Thoughts on its Future

A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser-Ney Smoothing