Source author record

Fabian Panse

Fabian Panse appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases

Catalog footprint

What is connected

3works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Towards Scalable Generation of Realistic Test Data for Duplicate Detection

Due to the increasing volume, volatility, and diversity of data in virtually all areas of our lives, the ability to detect duplicates in potentially linked data sources is more important than ever before. However, while research is already intensively engaged in adapting duplicate detection algorithms to the changing circumstances, existing test data generators are still designed for small -- mostly relational -- datasets and can thus fulfill their intended task only to a limited extent. In this report, we present our ongoing research on a novel approach for test data generation that -- in contrast to existing solutions -- is able to produce large test datasets with complex schemas and more realistic error patterns while being easy to use for inexperienced users.

preprint2022arXiv

Frost: A Platform for Benchmarking and Exploring Data Matching Results

"Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it. Duplicates - multiple but different representations of the same real-world entities - are among the main reasons for poor data quality, so finding and configuring the right deduplication solution is essential. Existing data matching benchmarks focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not support the exploration of data matching results. To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching solutions, we present the Frost platform. It combines existing benchmarks, established quality metrics, cost and effort metrics, and exploration techniques, making it the first platform to allow systematic exploration to understand matching results. Frost is implemented and published in the open-source application Snowman, which includes the visual exploration of matching results.

preprint2022arXiv

Towards Polyglot Data Stores -- Overview and Open Research Questions

Nowadays, data-intensive applications face the problem of handling heterogeneous data with sometimes mutually exclusive use cases and soft non-functional goals such as consistency and availability. Since no single platform copes everything, various stores (RDBMS, NewSQL, NoSQL) for different workloads and use-cases have been developed. However, since each store is only a specialization, this motivates progress in polyglot data management emerged new systems called Mult- and Polystores. They are trying to access different stores transparently and combine their capabilities to achieve one or multiple given use-cases. This paper describes representative real-world use cases for data-intensive applications (OLTP and OLAP). It derives a set of requirements for polyglot data stores. Subsequently, we discuss the properties of selected Multi- and Polystores and evaluate them based on given needs illustrated by three common application use cases. We classify them into functional features, query processing technique, architecture and adaptivity and reveal a lack of capabilities, especially in changing conditions tightly integration. Finally, we outline the benefits and drawbacks of the surveyed systems and propose future research directions and current challenges in this area.

Fabian Panse

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Towards Scalable Generation of Realistic Test Data for Duplicate Detection

Frost: A Platform for Benchmarking and Exploring Data Matching Results

Towards Polyglot Data Stores -- Overview and Open Research Questions