Source author record

Andreas Spitz

Andreas Spitz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Information Retrieval cond-mat.dis-nn Databases Machine Learning physics.atom-ph physics.data-an

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Network Approach to Atomic Spectra

Network science provides a universal framework for modeling complex systems, contrasting the reductionist approach generally adopted in physics. In a prototypical study, we utilize network models created from spectroscopic data of atoms to predict microscopic properties of the underlying physical system. For simple atoms such as helium, an a posteriori inspection of spectroscopic network communities reveals the emergence of quantum numbers and symmetries. For more complex atoms such as thorium, finer network hierarchies suggest additional microscopic symmetries or configurations. Link prediction yields a quantitative ranking of yet unknown atomic transitions, offering opportunities to discover new spectral lines in a well-controlled manner. Our work promotes a genuine bi-directional exchange of methodology between network science and physics, and presents new perspectives for the study of atomic spectra.

preprint2022arXiv

Quote Erat Demonstrandum: A Web Interface for Exploring the Quotebank Corpus

The use of attributed quotes is the most direct and least filtered pathway of information propagation in news. Consequently, quotes play a central role in the conception, reception, and analysis of news stories. Since quotes provide a more direct window into a speaker's mind than regular reporting, they are a valuable resource for journalists and researchers alike. While substantial research efforts have been devoted to methods for the automated extraction of quotes from news and their attribution to speakers, few comprehensive corpora of attributed quotes from contemporary sources are available to the public. Here, we present an adaptive web interface for searching Quotebank, a massive collection of quotes from the news, which we make available at https://quotebank.dlab.tools.

preprint2022arXiv

Strong Heuristics for Named Entity Linking

Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods. However, such methods tend to come with caveats, such as no integration of suitable knowledge bases (like Wikidata) for emerging entities, a lack of scalability, and poor interpretability. Here, we consider person disambiguation in Quotebank, a massive corpus of speaker-attributed quotations from the news, and investigate the suitability of intuitive, lightweight, and scalable heuristics for NEL in web-scale corpora. Our best performing heuristic disambiguates 94% and 63% of the mentions on Quotebank and the AIDA-CoNLL benchmark, respectively. Additionally, the proposed heuristics compare favourably to the state-of-the-art unsupervised and zero-shot methods, Eigenthemes and mGENRE, respectively, thereby serving as strong baselines for unsupervised and zero-shot entity linking.

preprint2020arXiv

Word Embeddings for Entity-annotated Texts

Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.

Andreas Spitz

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A Network Approach to Atomic Spectra

Quote Erat Demonstrandum: A Web Interface for Exploring the Quotebank Corpus

Strong Heuristics for Named Entity Linking

Word Embeddings for Entity-annotated Texts