Source author record

Roman Kern

Roman Kern appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Information Retrieval Computation and Language Cryptography and Security eess.IV eess.SY Human-Computer Interaction Systems and Control

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Formula RL: Deep Reinforcement Learning for Autonomous Racing using Telemetry Data

This paper explores the use of reinforcement learning (RL) models for autonomous racing. In contrast to passenger cars, where safety is the top priority, a racing car aims to minimize the lap-time. We frame the problem as a reinforcement learning task with a multidimensional input consisting of the vehicle telemetry, and a continuous action space. To find out which RL methods better solve the problem and whether the obtained models generalize to driving on unknown tracks, we put 10 variants of deep deterministic policy gradient (DDPG) to race in two experiments: i)~studying how RL methods learn to drive a racing car and ii)~studying how the learning scenario influences the capability of the models to generalize. Our studies show that models trained with RL are not only able to drive faster than the baseline open source handcrafted bots but also generalize to unknown tracks.

preprint2022arXiv

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

Deep learning (DL) models for natural language processing (NLP) tasks often handle private data, demanding protection against breaches and disclosures. Data protection laws, such as the European Union's General Data Protection Regulation (GDPR), thereby enforce the need for privacy. Although many privacy-preserving NLP methods have been proposed in recent years, no categories to organize them have been introduced yet, making it hard to follow the progress of the literature. To close this gap, this article systematically reviews over sixty DL methods for privacy-preserving NLP published between 2016 and 2020, covering theoretical foundations, privacy-enhancing technologies, and analysis of their suitability for real-world scenarios. First, we introduce a novel taxonomy for classifying the existing methods into three categories: data safeguarding methods, trusted methods, and verification methods. Second, we present an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation. Third, throughout the review, we describe privacy issues in the NLP pipeline in a holistic view. Further, we discuss open challenges in privacy-preserving NLP regarding data traceability, computation overhead, dataset size, the prevalence of human biases in embeddings, and the privacy-utility tradeoff. Finally, this review presents future research directions to guide successive research and development of privacy-preserving NLP models.

preprint2022arXiv

Privacy in Open Search: A Review of Challenges and Solutions

Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.

preprint2022arXiv

Towards a General Framework to Embed Advanced Machine Learning in Process Control Systems

Since high data volume and complex data formats delivered in modern high-end production environments go beyond the scope of classical process control systems, more advanced tools involving machine learning are required to reliably recognize failure patterns. However, currently, such systems lack a general setup and are only available as application-specific solutions. We propose a process control framework entitled Health Factor for Process Control (HFPC) to bridge the gap between conventional statistical tools and novel machine learning (ML) algorithms. HFPC comprises two main concepts: (a) pattern type to account for qualitative characteristics (error patterns) and (b) intensity to quantify the level of a deviation. While the system retains large model generality, allowing a broad scope of potential application areas, we demonstrate its favorable mathematical properties in a theoretical analysis. In a case study from the semiconductor industry, we underline that (a) our framework is of practical relevance and goes beyond conventional process control, and (b) achieves high-quality experimental results. We conclude that our work contributes to the integration of ML in real-world process control and paves the way to automated decision support in manufacturing.

preprint2021arXiv

On the Impact of Communities on Semi-supervised Classification Using Graph Neural Networks

Graph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. In this work, we systematically study the impact of community structure on the performance of GNNs in semi-supervised node classification on graphs. Following an ablation study on six datasets, we measure the performance of GNNs on the original graphs, and the change in performance in the presence and the absence of community structure. Our results suggest that communities typically have a major impact on the learning process and classification performance. For example, in cases where the majority of nodes from one community share a single classification label, breaking up community structure results in a significant performance drop. On the other hand, for cases where labels show low correlation with communities, we find that the graph structure is rather irrelevant to the learning process, and a feature-only baseline becomes hard to beat. With our work, we provide deeper insights in the abilities and limitations of GNNs, including a set of general guidelines for model selection based on the graph structure.

preprint2020arXiv

A Formally Robust Time Series Distance Metric

Distance-based classification is among the most competitive classification methods for time series data. The most critical component of distance-based classification is the selected distance function. Past research has proposed various different distance metrics or measures dedicated to particular aspects of real-world time series data, yet there is an important aspect that has not been considered so far: Robustness against arbitrary data contamination. In this work, we propose a novel distance metric that is robust against arbitrarily "bad" contamination and has a worst-case computational complexity of $\mathcal{O}(n\log n)$. We formally argue why our proposed metric is robust, and demonstrate in an empirical evaluation that the metric yields competitive classification accuracy when applied in k-Nearest Neighbor time series classification.

preprint2016arXiv

From Data to Visualisations and Back: Selecting Visualisations Based on Data and System Design Considerations

Graphical interfaces and interactive visualisations are typical mediators between human users and data analytics systems. HCI researchers and developers have to be able to understand both human needs and back-end data analytics. Participants of our tutorial will learn how visualisation and interface design can be combined with data analytics to provide better visualisations. In the first of three parts, the participants will learn about visualisations and how to appropriately select them. In the second part, restrictions and opportunities associated with different data analytics systems will be discussed. In the final part, the participants will have the opportunity to develop visualisations and interface designs under given scenarios of data and system settings.

preprint2014arXiv

Recommending Scientific Literature: Comparing Use-Cases and Algorithms

An important aspect of a researcher's activities is to find relevant and related publications. The task of a recommender system for scientific publications is to provide a list of papers that match these criteria. Based on the collection of publications managed by Mendeley, four data sets have been assembled that reflect different aspects of relatedness. Each of these relatedness scenarios reflect a user's search strategy. These scenarios are public groups, venues, author publications and user libraries. The first three of these data sets are being made publicly available for other researchers to compare algorithms against. Three recommender systems have been implemented: a collaborative filtering system; a content-based filtering system; and a hybrid of these two systems. Results from testing demonstrate that collaborative filtering slightly outperforms the content-based approach, but fails in some scenarios. The hybrid system, that combines the two recommendation methods, provides the best performance, achieving a precision of up to 70%. This suggests that both techniques contribute complementary information in the context of recommending scientific literature and different approaches suite for different information needs.

Roman Kern

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Formula RL: Deep Reinforcement Learning for Autonomous Racing using Telemetry Data

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

Privacy in Open Search: A Review of Challenges and Solutions

Towards a General Framework to Embed Advanced Machine Learning in Process Control Systems

On the Impact of Communities on Semi-supervised Classification Using Graph Neural Networks

A Formally Robust Time Series Distance Metric

From Data to Visualisations and Back: Selecting Visualisations Based on Data and System Design Considerations

Recommending Scientific Literature: Comparing Use-Cases and Algorithms