Source author record

Marc-André Kaufhold

Marc-André Kaufhold appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language cs.CY Networking and Internet Architecture

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization capabilities, it can also address many other challenges and problems, from overcoming a limited amount of training data, to regularizing the objective, to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation and a taxonomy for existing works, this survey is concerned with data augmentation methods for textual classification and aims to provide a concise and comprehensive overview for researchers and practitioners. Derived from the taxonomy, we divide more than 100 methods into 12 different groupings and give state-of-the-art references expounding which methods are highly promising by relating them to each other. Finally, research perspectives that may constitute a building block for future work are provided.

preprint2022arXiv

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. Especially with regard to small data analytics, additive accuracy gains of up to 15.53% and 3.56% are achieved within a constructed low data regime, compared to the no augmentation baseline and another data augmentation technique. As the current track of these constructed regimes is not universally applicable, we also show major improvements in several real world low data tasks (up to +4.84 F1-score). Since we are evaluating the method from many perspectives (in total 11 datasets), we also observe situations where the method might not be suitable. We discuss implications and patterns for the successful application of our approach on different types of datasets.

preprint2020arXiv

Empirical Insights for Designing Information and Communication Technology for International Disaster Response

Due to the increase in natural disasters in the past years, Disaster Response Organizations (DROs) are faced with the challenge of coping with more and larger operations. Currently appointed Information and Communications Technology (ICT) used for coordination and communication is sometimes outdated and does not scale, while novel technologies have the potential to greatly improve disaster response efficiency. To allow adoption of these novel technologies, ICT system designers have to take into account the particular needs of DROs and characteristics of International Disaster Response (IDR). This work attempts to bring the humanitarian and ICT communities closer together. In this work, we analyze IDR-related documents and conduct expert interviews. Using open coding, we extract empirical insights and translate the peculiarities of DRO coordination and operation into tangible ICT design requirements. This information is based on interviews with active IDR staff as well as DRO guidelines and reports. Ultimately, the goal of this paper is to serve as a reference for future ICT research endeavors to support and increase the efficiency of IDR operations.