Source author record

Andreas Thor

Andreas Thor appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Databases Distributed, Parallel, and Cluster Computing Information Retrieval

Catalog footprint

What is connected

7works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Introducing CitedReferencesExplorer (CRExplorer): A program for Reference Publication Year Spectroscopy with Cited References Standardization

We introduce a new tool - the CitedReferencesExplorer (CRExplorer, www.crexplorer.net) - which can be used to disambiguate and analyze the cited references (CRs) of a publication set downloaded from the Web of Science (WoS). The tool is especially suitable to identify those publications which have been frequently cited by the researchers in a field and thereby to study for example the historical roots of a research field or topic. CRExplorer simplifies the identification of key publications by enabling the user to work with both a graph for identifying most frequently cited reference publication years (RPYs) and the list of references for the RPYs which have been most frequently cited. A further focus of the program is on the standardization of CRs. It is a serious problem in bibliometrics that there are several variants of the same CR in the WoS. In this study, CRExplorer is used to study the CRs of all papers published in the Journal of Informetrics. The analyses focus on the most important papers published between 1980 and 1990.

preprint2016arXiv

New features of CitedReferencesExplorer (CRExplorer)

CRExplorer version 1.6.7 was released on July 5, 2016. This version includes the following new features and improvements: Scopus: Using "File" - "Import" - "Scopus", CRExplorer reads files from Scopus. The file format "CSV" (including citations, abstracts and references) should be chosen in Scopus for downloading records. Export facilities: Using "File" - "Export" - "Scopus", CRExplorer exports files in the Scopus format. Using "File" - "Export" - "Web of Science", CRExplorer exports files in the Web of Science format. These files can be imported in other bibliometric programs (e.g. VOSviewer). Space bar: Select a specific cited reference in the cited references table, press the space bar, and all bibliographic details of the CR are shown. Internal file format: Using "File" - "Save", working files are saved in the internal file format "*.cre". The files include all data including matching results and manual matching corrections. The files can be opened by using "File" - "Open".

preprint2016arXiv

Referenced Publication Year Spectroscopy (RPYS) and Algorithmic Historiography: The Bibliometric Reconstruction of András Schubert's Œuvre

Referenced Publication Year Spectroscopy (RPYS) was recently introduced as a method to analyze the historical roots of research fields and groups or institutions. RPYS maps the distribution of the publication years of the cited references in a document set. In this study, we apply this methodology to the œuvre of an individual researcher on the occasion of a Festschrift for András Schubert's 70th birthday. We discuss the different options of RPYS in relation to one another (e.g. Multi-RPYS), and in relation to the longer-term research program of algorithmic historiography (e.g., HistCite) based on Schubert's publications (n=172) and cited references therein as a bibliographic domain in scientometrics. Main path analysis and Multi-RPYS of the citation network are used to show the changes and continuities in Schubert's intellectual career. Diachronic and static decomposition of a document set can lead to different results, while the analytically distinguishable lines of research may overlap and interact over time, and intermittent.

preprint2012arXiv

How do Ontology Mappings Change in the Life Sciences?

Mappings between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontologies also require the adaptation of ontology mappings. So far the evolution of ontology mappings has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze how mappings between popular life science ontologies evolve for different match algorithms. We also evaluate which semantic ontology changes primarily affect the mappings. We further investigate alternatives to predict or estimate the degree of future mapping changes based on previous ontology and mapping transitions.

preprint2011arXiv

Load Balancing for MapReduce-based Entity Resolution

The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an even redistribution of data between map and reduce tasks. In the presence of skewed data, sophisticated redistribution approaches thus become necessary to achieve load balancing among all reduce tasks to be executed in parallel. For the complex problem of entity resolution, we propose and evaluate two approaches for such skew handling and load balancing. The approaches support blocking techniques to reduce the search space of entity resolution, utilize a preprocessing MapReduce job to analyze the data distribution, and distribute the entities of large blocks among multiple reduce tasks. The evaluation on a real cloud infrastructure shows the value and effectiveness of the proposed load balancing approaches.

preprint2010arXiv

Evaluation of Query Generators for Entity Search Engines

Dynamic web applications such as mashups need efficient access to web data that is only accessible via entity search engines (e.g. product or publication search engines). However, most current mashup systems and applications only support simple keyword searches for retrieving data from search engines. We propose the use of more powerful search strategies building on so-called query generators. For a given set of entities query generators are able to automatically determine a set of search queries to retrieve these entities from an entity search engine. We demonstrate the usefulness of query generators for on-demand web data integration and evaluate the effectiveness and efficiency of query generators for a challenging real-world integration scenario.

preprint2010arXiv

Parallel Sorted Neighborhood Blocking with MapReduce

Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhood blocking that either use multiple MapReduce jobs or apply a tailored data replication.

Andreas Thor

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Introducing CitedReferencesExplorer (CRExplorer): A program for Reference Publication Year Spectroscopy with Cited References Standardization

New features of CitedReferencesExplorer (CRExplorer)

Referenced Publication Year Spectroscopy (RPYS) and Algorithmic Historiography: The Bibliometric Reconstruction of András Schubert's Œuvre

How do Ontology Mappings Change in the Life Sciences?

Load Balancing for MapReduce-based Entity Resolution

Evaluation of Query Generators for Entity Search Engines

Parallel Sorted Neighborhood Blocking with MapReduce