Source author record

Markus Peschl

Markus Peschl appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Information Theory Machine Learning math.IT

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Learning Perturbations for Soft-Output Linear MIMO Demappers

Tree-based demappers for multiple-input multiple-output (MIMO) detection such as the sphere decoder can achieve near-optimal performance but incur high computational cost due to their sequential nature. In this paper, we propose the perturbed linear demapper (PLM), which is a novel data-driven model for computing soft outputs in parallel. To achieve this, the PLM learns a distribution centered on an initial linear estimate and a log-likelihood ratio clipping parameter using end-to-end Bayesian optimization. Furthermore, we show that lattice-reduction can be naturally incorporated into the PLM pipeline, which allows to trade off computational cost against coded block error rate reduction. We find that the optimized PLM can achieve near maximum-likelihood (ML) performance in Rayleigh channels, making it an efficient alternative to tree-based demappers.

preprint2021arXiv

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms into a Pareto-optimal policy. Through maintaining a distribution over scalarization weights, our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies. We empirically demonstrate the effectiveness of MORAL in two scenarios, which model a delivery and an emergency task that require an agent to act in the presence of normative conflicts. Overall, we consider our research a step towards multi-objective RL with learned rewards, bridging the gap between current reward learning and machine ethics literature.