Source author record

Philipp Mayr

Philipp Mayr appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Information Retrieval Human-Computer Interaction Computation and Language cs.CY physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

45works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR

This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library BL19 digital collection (more than 35,000 works from 1700-1899), we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. We focus on knowledge transfer from fiction to non-fiction, investigating how narrative understanding and semantic richness in fiction can improve retrieval for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.

preprint2026arXiv

MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

Users increasingly expect modern search systems to offer a unified interface that seamlessly retrieves information from diverse data sources and formats. However, current information retrieval (IR) evaluation benchmarks have not kept pace with this development, primarily due to the lack of test collections that represent the diversity of contemporary search domains. We address this critical gap with MIRA, a novel benchmark based on a large-scale social science search platform. MIRA is designed for category-aware ranking across heterogeneous categories - Publications, Research Data, Variables, and Instruments & Tools - within a single, unified evaluation framework. The proposed collection is distinctive in several ways: (1) it is built upon real user queries, providing a more realistic basis for evaluation; (2) it covers scholarly items from four distinct categories, enabling multi-faceted evaluation; and (3) it leverages a Large Language Model to generate topic descriptions and narratives, as well as for relevance assessment with respect to these topics, substantially reducing the labor and cost of test collection generation. We release this resource to benefit the community by providing a foundational testbed for the research on multi-faceted, category-aware, integrated, or cross-category information retrieval.

preprint2022arXiv

Evaluation of Embedding Models for Automatic Extraction and Classification of Acknowledged Entities in Scientific Documents

Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the acknowledgment text in scientific papers. We trained and implemented a named entity recognition (NER) task using the Flair NLP-framework. The training was conducted using three default Flair NER models with two differently-sized corpora. The Flair Embeddings model trained on the larger training corpus showed the best accuracy of 0.77. Our model is able to recognize six entity types: funding agency, grant number, individuals, university, corporation and miscellaneous. The model works more precise for some entity types than the others, thus, individuals and grant numbers showed very good F1-Score over 0.9. Most of the previous works on acknowledgement analysis were limited by the manual evaluation of data and therefore by the amount of processed data. This model can be applied for the comprehensive analysis of the acknowledgement texts and may potentially make a great contribution to the field of automated acknowledgement analysis.

preprint2022arXiv

Studying Retrievability of Publications and Datasets in an Integrated Retrieval System

In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper show a distinguishable diversity in the retrievability scores among the documents of different types.

preprint2022arXiv

The many facets of academic mobility and its impact on scholars' career

International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. The aim of this study is twofold. First, we investigate to what extent individual factors associate with the mobility of researchers. Second, we explore the relationship between mobility and scientific activity and impact. For this purpose, we used a bibliometric approach to track the mobility of authors. To compare the scientific outcomes of researchers, we considered the number of publications and received citations as indicators, as well as the number of unique co-authors in all their publications. We also analysed the co-authorship network of researchers and compared centrality measures of mobile and non-mobile researchers. Results show that researchers from North America and Sub-Saharan Africa, particularly female ones, have the lowest, respectively, highest tendency towards international mobility. Having international co-authors increases the probability of international movement. Our findings uncover gender inequality in international mobility across scientific fields and countries. Across genders, researchers in the Physical sciences have the most and in the Social sciences the least rate of mobility. We observed more mobility for Social scientists at the advanced career stage, while researchers in other fields prefer to move at earlier career stages. Also, we found a positive correlation between mobility and scientific outcomes, but no apparent difference between females and males. Comparing the centrality of mobile and non-mobile researchers in the co-authorship networks reveals a higher social capital advantage for mobile researchers.

preprint2022arXiv

Towards Automated Survey Variable Search and Summarization in Social Science Publications

Nowadays there is a growing trend in many scientific disciplines to support researchers by providing enhanced information access through linking of publications and underlying datasets, so as to support research with infrastructure to enhance reproducibility and reusability of research results. In this research note, we present an overview of an ongoing research project, named VADIS (VAriable Detection, Interlinking and Summarization), that aims at developing technology and infrastructure for enhanced information access in the Social Sciences via search and summarization of publications on the basis of automatic identification and indexing of survey variables in text. We provide an overview of the overarching vision underlying our project, its main components, and related challenges, as well as a thorough discussion of how these are meant to address the limitations of current information access systems for publications in the Social Sciences. We show how this goal can be concretely implemented in an end-user system by presenting a search prototype, which is based on user requirements collected from qualitative interviews with empirical Social Science researchers.

preprint2020arXiv

Bibliometric-enhanced Information Retrieval 10th Anniversary Workshop Edition

The Bibliometric-enhanced Information Retrieval workshop series (BIR) was launched at ECIR in 2014 \cite{MayrEtAl2014} and it was held at ECIR each year since then. This year we organize the 10th iteration of BIR. The workshop series at ECIR and JCDL/SIGIR tackles issues related to academic search, at the crossroads between Information Retrieval, Natural Language Processing and Bibliometrics. In this overview paper, we summarize the past workshops, present the workshop topics for 2020 and reflect on some future steps for this workshop series.

preprint2020arXiv

ECIR 2020 Workshops: Assessing the Impact of Going Online

ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organizers and the workshop participants. We provide a report on the organizational aspect of these events and the consequences for participants. Covering the scientific dimension of each workshop is outside the scope of this article.

preprint2020arXiv

The OpenCitations Data Model

A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data supplier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. We also evaluate the effective reusability of OCDM according to ontology evaluation practices, mention existing users of OCDM, and discuss the use and impact of OCDM in the wider open science community.

preprint2016arXiv

A Semi-Automatic Approach for Detecting Dataset References in Social Science Texts

Today, full-texts of scientific articles are often stored in different locations than the used datasets. Dataset registries aim at a closer integration by making datasets citable but authors typically refer to datasets using inconsistent abbreviations and heterogeneous metadata (e.g. title, publication year). It is thus hard to reproduce research results, to access datasets for further analysis, and to determine the impact of a dataset. Manually detecting references to datasets in scientific articles is time-consuming and requires expert knowledge in the underlying research domain.We propose and evaluate a semi-automatic three-step approach for finding explicit references to datasets in social sciences articles.We first extract pre-defined special features from dataset titles in the da|ra registry, then detect references to datasets using the extracted features, and finally match the references found with corresponding dataset titles. The approach does not require a corpus of articles (avoiding the cold start problem) and performs well on a test corpus. We achieved an F-measure of 0.84 for detecting references in full-texts and an F-measure of 0.83 for finding correct matches of detected references in the da|ra dataset registry.

preprint2016arXiv

Bibliometrics and Information Retrieval: Creating Knowledge through Research Synergies

This panel brings together experts in bibliometrics and information retrieval to discuss how each of these two important areas of information science can help to inform the research of the other. There is a growing body of literature that capitalizes on the synergies created by combining methodological approaches of each to solve research problems and practical issues related to how information is created, stored, organized, retrieved and used. The session will begin with an overview of the common threads that exist between IR and metrics, followed by a summary of findings from the BIR workshops and examples of research projects that combine aspects of each area to benefit IR or metrics research areas, including search results ranking, semantic indexing and visualization. The panel will conclude with an engaging discussion with the audience to identify future areas of research and collaboration.

preprint2016arXiv

Identifying and Improving Dataset References in Social Sciences Full Texts

Scientific full text papers are usually stored in separate places than their underlying research datasets. Authors typically make references to datasets by mentioning them for example by using their titles and the year of publication. However, in most cases explicit links that would provide readers with direct access to referenced datasets are missing. Manually detecting references to datasets in papers is time consuming and requires an expert in the domain of the paper. In order to make explicit all links to datasets in papers that have been published already, we suggest and evaluate a semi-automatic approach for finding references to datasets in social sciences papers. Our approach does not need a corpus of papers (no cold start problem) and it performs well on a small test corpus (gold standard). Our approach achieved an F-measure of 0.84 for identifying references in full texts and an F-measure of 0.83 for finding correct matches of detected references in the da|ra dataset registry.

preprint2016arXiv

Opening Scholarly Communication in Social Sciences: Supporting Open Peer Review with Fidus Writer

Our system will initially provide readers, authors and reviewers with an alternative, thus having the potential to gain wider acceptance and gradually replace the old, incoherent publication process of our journals and of others in related fields. It will make journals more "open" (in terms of reusability) that are open access already, and it has the potential to serve as an incentive for turning "closed" journals into open access ones. In this poster paper we will present the framework of the OSCOSS system and highlight the reviewer use case.

preprint2015arXiv

Assessing a human mediated current awareness service

In this paper, we present an approach for analyzing the behavior of editors in the large current awareness service "NEP: New Economics Papers". We processed data from more than 38,000 issues derived from 90 different NEP reports over the past ten years. The aim of our analysis was to gain an inside to the editor behaviour when creating an issue and to look for factors that influence the success of a report. In our study we looked at the following features: average editing time, the average number of papers in an issue and the editor effort measured on presorted issues as relative search length (RSL). We found an average issue size of 12.4 documents per issue. The average editing time is rather low with 14.5 minute. We get to the point that the success of a report is mainly driven by its topic and the number of subscribers, as well as proactive action by the editor to promote the report in her community.

preprint2015arXiv

Bibliometric-enhanced Information Retrieval: 2nd International BIR Workshop

This workshop brings together experts of communities which often have been perceived as different once: bibliometrics / scientometrics / informetrics on the one side and information retrieval on the other. Our motivation as organizers of the workshop started from the observation that main discourses in both fields are different, that communities are only partly overlapping and from the belief that a knowledge transfer would be profitable for both sides. Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. On the other side, more and more information professionals, working in libraries and archives are confronted with applying bibliometric techniques in their services. This way knowledge exchange becomes more urgent. The first workshop set the research agenda, by introducing in each other methods, reporting about current research problems and brainstorming about common interests. This follow-up workshop continues the overall communication, but also puts one problem into the focus. In particular, we will explore how statistical modelling of scholarship can improve retrieval services for specific communities, as well as for large, cross-domain collections like Mendeley or ResearchGate. This second BIR workshop continues to raise awareness of the missing link between Information Retrieval (IR) and bibliometrics and contributes to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the scholarly search engine interface.

preprint2015arXiv

Bibliometric-Enhanced Information Retrieval: 3rd International BIR Workshop

The BIR workshop brings together experts in Bibliometrics and Information Retrieval. While sometimes perceived as rather loosely related, these research areas share various interests and face similar challenges. Our motivation as organizers of the BIR workshop stemmed from a twofold observation. First, both communities only partly overlap, albeit sharing various interests. Second, it will be profitable for both sides to tackle some of the emerging problems that scholars face today when they have to identify relevant and high quality literature in the fast growing number of electronic publications available worldwide. Bibliometric techniques are not yet used widely to enhance retrieval processes in digital libraries, although they offer value-added effects for users. Information professionals working in libraries and archives, however, are increasingly confronted with applying bibliometric techniques in their services. The first BIR workshop in 2014 set the research agenda by introducing each group to the other, illustrating state-of-the-art methods, reporting on current research problems, and brainstorming about common interests. The second workshop in 2015 further elaborated these themes. This third BIR workshop aims to foster a common ground for the incorporation of bibliometric-enhanced services into scholarly search engine interfaces. In particular we will address specific communities, as well as studies on large, cross-domain collections like Mendeley and ResearchGate. This third BIR workshop addresses explicitly both scholarly and industrial researchers.

preprint2015arXiv

Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.

preprint2015arXiv

Editorial for the Proceedings of the Workshop Knowledge Maps and Information Retrieval (KMIR2014) at Digital Libraries 2014

Knowledge maps are promising tools for visualizing the structure of large-scale information spaces, but still far away from being applicable for searching. The first international workshop on "Knowledge Maps and Information Retrieval (KMIR)", held as part of the International Conference on Digital Libraries 2014 in London, aimed at bringing together experts in Information Retrieval (IR) and knowledge mapping in order to discuss the potential of interactive knowledge maps for information seeking purposes.

preprint2015arXiv

Extending search facilities via bibliometric-enhanced stratagems

The paper introduces simple bibliometric-enhanced search facilities which are derived from the famous stratagems by Bates. Moves, tactics and stratagems are revisited from a Digital Library perspective. The potential of extended versions of "journal run" or "citation search" for interactive information retrieval is outlined. The authors elaborate on the future implementation and evaluation of new bibliometric-enhanced search services.

preprint2015arXiv

Mining Scientific Papers for Bibliometrics: a (very) Brief Survey of Methods and Tools

The Open Access movement in scientific publishing and search engines like Google Scholar have made scientific articles more broadly accessible. During the last decade, the availability of scientific papers in full text has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv and CiteSeer. The efforts to provide articles in machine-readable formats and the rise of Open Access publishing have resulted in a number of standardized formats for scientific papers (such as NLM-JATS, TEI, DocBook). Our aim is to stimulate research at the intersection of Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing.

preprint2014arXiv

Are topic-specific search term, journal name and author name recommendations relevant for researchers?

In this paper we describe a case study where researchers in the social sciences (n=19) assess topical relevance for controlled search terms, journal names and author names which have been compiled automatically by bibliometric-enhanced information retrieval (IR) services. We call these bibliometric-enhanced IR services Search Term Recommender (STR), Journal Name Recommender (JNR) and Author Name Recommender (ANR) in this paper. The researchers in our study (practitioners, PhD students and postdocs) were asked to assess the top n pre-processed recommendations from each recommender for specific research topics which have been named by them in an interview before the experiment. Our results show clearly that the presented search term, journal name and author name recommendations are highly relevant to the researchers' topic and can easily be integrated for search in Digital Libraries. The average precision for top ranked recommendations is 0.75 for author names, 0.74 for search terms and 0.73 for journal names. The relevance distribution differs largely across topics and researcher types. Practitioners seem to favor author name recommendations while postdocs have rated author name recommendations the lowest. In the experiment the small postdoc group (n=3) favor journal name recommendations.

preprint2014arXiv

Assessing Educational Research -- An Information Service for Monitoring a Heterogeneous Research Field

The paper presents a web prototype that visualises different characteristics of research projects in the heterogeneous domain of educational research. The concept of the application derives from the project "Monitoring Educational Research" (MoBi) that aims at identifying and implementing indicators that adequately describe structural properties and dynamics of the research field. The prototype enables users to visualise data regarding different indicators, e.g. "research activity", "funding", "qualification project", "disciplinary area". Since the application is based on Semantic MediaWikitechnology it furthermore provides an easily accessible opportunity to collaboratively work on a database of research projects. Users can jointly and in a semantically controlled way enter metadata on research projects which are the basis for the computation and visualisation of indicators.

preprint2014arXiv

Editorial for the Bibliometric-enhanced Information Retrieval Workshop at ECIR 2014

This first "Bibliometric-enhanced Information Retrieval" (BIR 2014) workshop aims to engage with the IR community about possible links to bibliometrics and scholarly communication. Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this workshop we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of co-authorship network, can improve retrieval services for specific communities, as well as for large, cross-domain collections. This workshop aims to raise awareness of the missing link between information retrieval (IR) and bibliometrics / scientometrics and to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the digital library interface. Our interests include information retrieval, information seeking, science modelling, network analysis, and digital libraries. The goal is to apply insights from bibliometrics, scientometrics, and informetrics to concrete practical problems of information retrieval and browsing.

preprint2014arXiv

Establishing an Online Access Panel for Interactive Information Retrieval Research

We propose an online access panel to support the evaluation process of Interactive Information Retrieval (IIR) systems - called IIRpanel. By maintaining an online access panel with users of IIR systems we assume that the recurring effort to recruit participants for web-based as well as for lab studies can be minimized. We target on using the online access panel not only for our own development processes but to open it for other interested researchers in the field of IIR. In this paper we present the concept of IIRpanel as well as first implementation details.

preprint2014arXiv

Identifying User Behavior in domain-specific Repositories

This paper presents an analysis of the user behavior of two different domain-specific repositories. The web analytic tool etracker was used to gain a first overall insight into the user behavior of these repositories. Moreover, we extended our work to describe an apache web log analysis approach which focuses on the identification of the user behavior. Therefore the user traffic within our systems is visualized using chord diagrams. We could find that recommendations are used frequently and users do rarely combine searching with faceting or filtering.

preprint2014arXiv

Is Evaluating Visual Search Interfaces in Digital Libraries Still an Issue?

Although various visual interfaces for digital libraries have been developed in prototypical systems, very few of these visual approaches have been integrated into today's digital libraries. In this position paper we argue that this is most likely due to the fact that the evaluation results of most visual systems lack comparability. There is no fix standard on how to evaluate visual interactive user interfaces. Therefore it is not possible to identify which approach is more suitable for a certain context. We feel that the comparability of evaluation results could be improved by building a common evaluation setup consisting of a reference system, based on a standardized corpus with fixed tasks and a panel for possible participants.

preprint2014arXiv

Knowledge Maps and Information Retrieval (KMIR)

Information systems usually show as a particular point of failure the vagueness between user search terms and the knowledge orders of the information space in question. Some kind of guided searching therefore becomes more and more important in order to precisely discover information without knowing the right search terms. Knowledge maps of digital library collections are promising navigation tools through knowledge spaces but still far away from being applicable for searching digital libraries. However, there is no continuous knowledge exchange between the "map makers" on the one hand and the Information Retrieval (IR) specialists on the other hand. Thus, there is also a lack of models that properly combine insights of the two strands. The proposed workshop aims at bringing together these two communities: experts in IR reflecting on visual enhanced search interfaces and experts in knowledge mapping reflecting on visualizations of the content of a collection that might also present a context for a search term in a visual manner. The intention of the workshop is to raise awareness of the potential of interactive knowledge maps for information seeking purposes and to create a common ground for experiments aiming at the incorporation of knowledge maps into IR models at the level of the user interface.

preprint2014arXiv

Recommender Systems using Pennant Diagrams in Digital Libraries

In digital libraries recommendations can be valuable for researchers, e.g. recommending related literature to a given context. Typically, in a scientific context the simple presentation of related content is not sufficient. Often the users demand a more detailed view on the connection of a document and its specific recommendations. The aim of pennants introduced by Howard White (2007) is to provide the user with a graph showing the relatedness / distance between a given document and related documents. Co-citation but also co-occurrence analysis are established methods for finding related documents to a seed. A seed could be for instance an author, a keyword, or a publication. In this paper we introduce a recommender system in the digital library sowiport using pennant diagrams which can be created from co-citation and/or co-occurrence analysis. The presentation at the NKOS workshop will present demos of pennants in sowiport and will elaborate on practical questions in visualizing pennants and evaluating the utility of pennants for search.

preprint2014arXiv

Social Media Monitoring of the Campaigns for the 2013 German Bundestag Elections on Facebook and Twitter

As more and more people use social media to communicate their view and perception of elections, researchers have increasingly been collecting and analyzing data from social media platforms. Our research focuses on social media communication related to the 2013 election of the German parlia-ment [translation: Bundestagswahl 2013]. We constructed several social media datasets using data from Facebook and Twitter. First, we identified the most relevant candidates (n=2,346) and checked whether they maintained social media accounts. The Facebook data was collected in November 2013 for the period of January 2009 to October 2013. On Facebook we identified 1,408 Facebook walls containing approximately 469,000 posts. Twitter data was collected between June and December 2013 finishing with the constitution of the government. On Twitter we identified 1,009 candidates and 76 other agents, for example, journalists. We estimated the number of relevant tweets to exceed eight million for the period from July 27 to September 27 alone. In this document we summarize past research in the literature, discuss possibilities for research with our data set, explain the data collection procedures, and provide a description of the data and a discussion of issues for archiving and dissemination of social media data.

preprint2013arXiv

An OAI-PMH-based Web Service for the Generation of Co-Author Networks

We will present a new component of our technical framework that was built to provide a brought range of reusable web services for the enhancement of typical scientific retrieval processes. The proposed component computes betweenness of authors in co-authorship networks extracted from publicly available metadata that was harvested using OAI-PMH.

preprint2013arXiv

Assessing Visualization Techniques for the Search Process in Digital Libraries

In this paper we present an overview of several visualization techniques to support the search process in Digital Libraries (DLs). The search process typically can be separated into three major phases: query formulation and refinement, browsing through result lists and viewing and interacting with documents and their properties. We discuss a selection of popular visualization techniques that have been developed for the different phases to support the user during the search process. Along prototypes based on the different techniques we show how the approaches have been implemented. Although various visualizations have been developed in prototypical systems very few of these approaches have been adapted into today's DLs. We conclude that this is most likely due to the fact that most systems are not evaluated intensely in real-life scenarios with real information seekers and that results of the interesting visualization techniques are often not comparable. We can say that many of the assessed systems did not properly address the information need of cur-rent users.

preprint2013arXiv

Bibliometric-enhanced Information Retrieval

Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this workshop we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of coauthorship network, can improve retrieval services for specific communities, as well as for large, cross-domain collections. This workshop aims to raise awareness of the missing link between information retrieval (IR) and bibliometrics/scientometrics and to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the digital library interface.

preprint2013arXiv

Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this paper we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of coauthorship network, can improve retrieval services for specific communities, as well as for large, cross-domain large collections. This paper aims to raise awareness of the missing link between information retrieval (IR) and bibliometrics / scientometrics and to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the digital library interface.

preprint2013arXiv

Pennants for Descriptors

We present a new technique (called pennants) for displaying the descriptors related to a descriptor across literatures, rather in a thesaurus. It has definite implications for online searching and browsing. Pennants, named for the flag they resemble, are a form of algorithmic prediction. Their cognitive base is in relevance theory (RT) from linguistic pragmatics (Sperber & Wilson 1995).

preprint2013arXiv

Relevance distributions across Bradford Zones: Can Bradfordizing improve search?

The purpose of this paper is to describe the evaluation of the effectiveness of the bibliometric technique Bradfordizing in an information retrieval (IR) scenario. Bradfordizing is used to re-rank topical document sets from conventional abstracting & indexing (A&I) databases into core and more peripheral document zones. Bradfordized lists of journal articles and monographs will be tested in a controlled scenario consisting of different A&I databases from social and political sciences, economics, psychology and medical science, 164 standardized IR topics and intellectual assessments of the listed documents. Does Bradfordizing improve the ratio of relevant documents in the first third (core) compared to the second and last third (zone 2 and zone 3, respectively)? The IR tests show that relevance distributions after re-ranking improve at a significant level if documents in the core are compared with documents in the succeeding zones. After Bradfordizing of document pools, the core has a significant better average precision than zone 2, zone 3 and baseline. This paper should be seen as an argument in favour of alternative non-textual (bibliometric) re-ranking methods which can be simply applied in text-based retrieval systems and in particular in A&I databases.

preprint2012arXiv

Discovering Links for Metadata Enrichment on Computer Science Papers

At the very beginning of compiling a bibliography, usually only basic information, such as title, authors and publication date of an item are known. In order to gather additional information about a specific item, one typically has to search the library catalog or use a web search engine. This look-up procedure implies a manual effort for every single item of a bibliography. In this technical report we present a proof of concept which utilizes Linked Data technology for the simple enrichment of sparse metadata sets. This is done by discovering owl:sameAs links be- tween an initial set of computer science papers and resources from external data sources like DBLP, ACM and the Semantic Web Conference Corpus. In this report, we demonstrate how the link discovery tool Silk is used to detect additional information and to enrich an initial set of records in the computer science domain. The pros and cons of silk as link discovery tool are summarized in the end.

preprint2012arXiv

Extending Term Suggestion with Author Names

Term suggestion or recommendation modules can help users to formulate their queries by mapping their personal vocabularies onto the specialized vocabulary of a digital library. While we examined actual user queries of the social sciences digital library Sowiport we could see that nearly one third of the users were explicitly looking for author names rather than terms. Common term recommenders neglect this fact. By picking up the idea of polyrepresentation we could show that in a standardized IR evaluation setting we can significantly increase the retrieval performances by adding topical-related author names to the query. This positive effect only appears when the query is additionally expanded with thesaurus terms. By just adding the author names to a query we often observe a query drift which results in worse results.

preprint2012arXiv

Improving Retrieval Results with discipline-specific Query Expansion

Choosing the right terms to describe an information need is becoming more difficult as the amount of available information increases. Search-Term-Recommendation (STR) systems can help to overcome these problems. This paper evaluates the benefits that may be gained from the use of STRs in Query Expansion (QE). We create 17 STRs, 16 based on specific disciplines and one giving general recommendations, and compare the retrieval performance of these STRs. The main findings are: (1) QE with specific STRs leads to significantly better results than QE with a general STR, (2) QE with specific STRs selected by a heuristic mechanism of topic classification leads to better results than the general STR, however (3) selecting the best matching specific STR in an automatic way is a major challenge of this process.

preprint2012arXiv

Integrating Interactive Visualizations in the Search Process of Digital Libraries and IR Systems

Interactive visualizations for exploring and retrieval have not yet become an integral part of digital libraries and information retrieval systems. We have integrated a set of interactive graphics in a real world social science digital library. These visualizations support the exploration of search queries, results and authors, can filter search results, show trends in the database and can support the creation of new search queries. The use of weighted brushing supports the identification of related metadata for search facets. We discuss some use cases of the combination of IR systems and interactive graphics. In a user study we verify that users can gain insights from statistical graphics intuitively and can adopt interaction techniques.

preprint2012arXiv

Visualizations in Exploratory Search: A User Study with Stock Market Information

In this paper we present an approach that integrates interactive visualizations in the exploratory search process. In this model visualizations can act as hubs where large amounts of information are made accessible in easy user interfaces. Through interaction techniques this information can be combined with related information on the World Wide Web. We applied the new search concept to the domain of stock market information and conducted a user study. Participants could use this interface without instructions, could complete complex tasks like identifying related information items, link heterogeneous information types and use different interaction techniques to access related information more easily. In this way, users could quickly acquire knowledge in an unfamiliar domain.

preprint2011arXiv

A Science Model Driven Retrieval Prototype

This paper is about a better understanding on the structure and dynamics of science and the usage of these insights for compensating the typical problems that arises in metadata-driven Digital Libraries. Three science model driven retrieval services are presented: co-word analysis based query expansion, re-ranking via Bradfordizing and author centrality. The services are evaluated with relevance assessments from which two important implications emerge: (1) precision values of the retrieval service are the same or better than the tf-idf retrieval baseline and (2) each service retrieved a disjoint set of documents. The different services each favor quite other - but still relevant - documents than pure term-frequency based rankings. The proposed models and derived retrieval services therefore open up new viewpoints on the scientific knowledge space and provide an alternative framework to structure scholarly information systems.

preprint2011arXiv

Applying Science Models for Search

The paper proposes three different kinds of science models as value-added services that are integrated in the retrieval process to enhance retrieval quality. The paper discusses the approaches Search Term Recommendation, Bradfordizing and Author Centrality on a general level and addresses implementation issues of the models within a real-life retrieval environment.

preprint2011arXiv

Comparing webometric with web-independent rankings: a case study with German universities

In this paper we examine if hyperlink-based (webometric) indicators can be used to rank academic websites. Therefore we analyzed the interlinking structure of German university websites and compared our simple hyperlink-based ranking with official and web-independent rankings of universities. We found that link impact could not easily be seen as a prestige factor for universities.

preprint2010arXiv

Establishing a Multi-Thesauri-Scenario based on SKOS and Cross-Concordances

This case study proposes a scenario with three topic-related thesauri, which have been connected with bilateral cross-concordances as part of a major terminology mapping initiative in the project KoMoHe (Mayr & Petras, 2008). The thesauri have already been or will be converted to SKOS and in order to not omit the relevant crosswalks, the mapping properties of SKOS will be used for modeling them adequately.

preprint2010arXiv

Implications of Inter-Rater Agreement on a Student Information Retrieval Evaluation

This paper is about an information retrieval evaluation on three different retrieval-supporting services. All three services were designed to compensate typical problems that arise in metadata-driven Digital Libraries, which are not adequately handled by a simple tf-idf based retrieval. The services are: (1) a co-word analysis based query expansion mechanism and re-ranking via (2) Bradfordizing and (3) author centrality. The services are evaluated with relevance assessments conducted by 73 information science students. Since the students are neither information professionals nor domain experts the question of inter-rater agreement is taken into consideration. Two important implications emerge: (1) the inter-rater agreement rates were mainly fair to moderate and (2) after a data-cleaning step which erased the assessments with poor agreement rates the evaluation data shows that the three retrieval services returned disjoint but still relevant result sets.

Philipp Mayr

What is connected

Connect this record

See the researcher in context

Building this map preview

45 published item(s)

Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR

MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

Evaluation of Embedding Models for Automatic Extraction and Classification of Acknowledged Entities in Scientific Documents

Studying Retrievability of Publications and Datasets in an Integrated Retrieval System

The many facets of academic mobility and its impact on scholars' career

Towards Automated Survey Variable Search and Summarization in Social Science Publications

Bibliometric-enhanced Information Retrieval 10th Anniversary Workshop Edition

ECIR 2020 Workshops: Assessing the Impact of Going Online

The OpenCitations Data Model

A Semi-Automatic Approach for Detecting Dataset References in Social Science Texts

Bibliometrics and Information Retrieval: Creating Knowledge through Research Synergies

Identifying and Improving Dataset References in Social Sciences Full Texts

Opening Scholarly Communication in Social Sciences: Supporting Open Peer Review with Fidus Writer

Assessing a human mediated current awareness service

Bibliometric-enhanced Information Retrieval: 2nd International BIR Workshop

Bibliometric-Enhanced Information Retrieval: 3rd International BIR Workshop

Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

Editorial for the Proceedings of the Workshop Knowledge Maps and Information Retrieval (KMIR2014) at Digital Libraries 2014

Extending search facilities via bibliometric-enhanced stratagems

Mining Scientific Papers for Bibliometrics: a (very) Brief Survey of Methods and Tools

Are topic-specific search term, journal name and author name recommendations relevant for researchers?

Assessing Educational Research -- An Information Service for Monitoring a Heterogeneous Research Field

Editorial for the Bibliometric-enhanced Information Retrieval Workshop at ECIR 2014

Establishing an Online Access Panel for Interactive Information Retrieval Research

Identifying User Behavior in domain-specific Repositories

Is Evaluating Visual Search Interfaces in Digital Libraries Still an Issue?

Knowledge Maps and Information Retrieval (KMIR)

Recommender Systems using Pennant Diagrams in Digital Libraries

Social Media Monitoring of the Campaigns for the 2013 German Bundestag Elections on Facebook and Twitter

An OAI-PMH-based Web Service for the Generation of Co-Author Networks

Assessing Visualization Techniques for the Search Process in Digital Libraries

Bibliometric-enhanced Information Retrieval

Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems

Pennants for Descriptors

Relevance distributions across Bradford Zones: Can Bradfordizing improve search?

Discovering Links for Metadata Enrichment on Computer Science Papers

Extending Term Suggestion with Author Names

Improving Retrieval Results with discipline-specific Query Expansion

Integrating Interactive Visualizations in the Search Process of Digital Libraries and IR Systems

Visualizations in Exploratory Search: A User Study with Stock Market Information

A Science Model Driven Retrieval Prototype

Applying Science Models for Search

Comparing webometric with web-independent rankings: a case study with German universities

Establishing a Multi-Thesauri-Scenario based on SKOS and Cross-Concordances

Implications of Inter-Rater Agreement on a Student Information Retrieval Evaluation