Source author record

Till Blume

Till Blume appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Databases Artificial Intelligence Digital Libraries Information Retrieval Machine Learning

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Scaling R-GCN Training with Graph Summarization

Training of Relational Graph Convolutional Networks (R-GCN) is a memory intense task. The amount of gradient information that needs to be stored during training for real-world graphs is often too large for the amount of memory available on most GPUs. In this work, we experiment with the use of graph summarization techniques to compress the graph and hence reduce the amount of memory needed. After training the R-GCN on the graph summary, we transfer the weights back to the original graph and attempt to perform inference on it. We obtain reasonable results on the AIFB, MUTAG and AM datasets. Our experiments show that training on the graph summary can yield a comparable or higher accuracy to training on the original graphs.Furthermore, if we take the time to compute the summary out of the equation, we observe that the smaller graph representations obtained with graph summarization methods reduces the computational overhead. However, further experiments are needed to evaluate additional graph summary models and whether our findings also holds true for very large graphs.

preprint2022arXiv

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one perspective on document similarity that ignores which aspects make two documents alike. To address this limitation, aspect-based similarity measures have been developed using document segmentation or pairwise multi-class document classification. While segmentation harms the document coherence, the pairwise classification approach scales poorly to large scale corpora. In this paper, we treat aspect-based similarity as a classical vector similarity problem in aspect-specific embedding spaces. We represent a document not as a single generic embedding but as multiple specialized embeddings. Our approach avoids document segmentation and scales linearly w.r.t.the corpus size. In an empirical study, we use the Papers with Code corpus containing 157,606 research papers and consider the task, method, and dataset of the respective research papers as their aspects. We compare and analyze three generic document embeddings, six specialized document embeddings and a pairwise classification baseline in the context of research paper recommendations. As generic document embeddings, we consider FastText, SciBERT, and SPECTER. To compute the specialized document embeddings, we compare three alternative methods inspired by retrofitting, fine-tuning, and Siamese networks. In our experiments, Siamese SciBERT achieved the highest scores. Additional analyses indicate an implicit bias of the generic document embeddings towards the dataset aspect and against the method aspect of each research paper. Our approach of aspect-based document embeddings mitigates potential risks arising from implicit biases by making them explicit.

preprint2021arXiv

FLUID: A Common Model for Semantic Structural Graph Summaries Based on Equivalence Relations

Summarization is a widespread method for handling very large graphs. The task of structural graph summarization is to compute a concise but meaningful synopsis of the key structural information of a graph. As summaries may be used for many different purposes, there is no single concept or model of graph summaries. We have studied existing structural graph summaries for large-scale (semantic) graphs. Despite their different concepts and purposes, we found commonalities in the graph structures they capture. We use these commonalities to provide for the first time a formally defined common model, FLUID (FLexible graph sUmmarIes for Data graphs), that allows us to flexibly define structural graph summaries. FLUID allows graph summaries to be quickly defined, adapted, and compared for different purposes and datasets. To this end, FLUID provides features of structural summarization based on equivalence relations such as distinction of types and properties, direction of edges, bisimulation, and inference. We conduct a detailed complexity analysis of the features provided by FLUID. We show that graph summaries defined with FLUID can be computed in the worst case in time $\mathcal{O}(n^2)$ w.r.t. $n$, the number of edges in the data graph. An empirical analysis of large-scale web graphs with billions of edges indicates a typical running time of $Θ(n)$. Based on the formal FLUID model, one can quickly define and modify various structural graph summaries from the literature and beyond.

preprint2020arXiv

Indexing Data on the Web: A Comparison of Schema-level Indices for Data Search -- Extended Technical Report

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluate six representative index models with unique feature combinations. Among them is a new index model incorporating inferencing over RDFS and owl:sameAs. We implement all index models for the first time into a single, stream-based framework. We evaluate variations of the index models considering sub-graphs of size 0, 1, and 2 hops on two large, real-world datasets. We evaluate the quality of the indices regarding the compression ratio, summarization ratio, and F1-score denoting the approximation quality of the stream-based index computation. The experiments reveal huge variations in compression ratio, summarization ratio, and approximation quality for different index models, queries, and datasets. However, we observe meaningful correlations in the results that help to determine the right index model for a given task, type of query, and dataset.

preprint2020arXiv

Towards an Open Platform for Legal Information

Recent advances in the area of legal information systems have led to a variety of applications that promise support in processing and accessing legal documents. Unfortunately, these applications have various limitations, e.g., regarding scope or extensibility. Furthermore, we do not observe a trend towards open access in digital libraries in the legal domain as we observe in other domains, e.g., economics of computer science. To improve open access in the legal domain, we present our approach for an open source platform to transparently process and access Legal Open Data. This enables the sustainable development of legal applications by offering a single technology stack. Moreover, the approach facilitates the development and deployment of new technologies. As proof of concept, we implemented six technologies and generated metadata for more than 250,000 German laws and court decisions. Thus, we can provide users of our platform not only access to legal documents, but also the contained information.

Till Blume

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Scaling R-GCN Training with Graph Summarization

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

FLUID: A Common Model for Semantic Structural Graph Summaries Based on Equivalence Relations

Indexing Data on the Web: A Comparison of Schema-level Indices for Data Search -- Extended Technical Report

Towards an Open Platform for Legal Information