Paper detail

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

In real-world decision optimization, often multiple competing objectives must be taken into account. Following classical reinforcement learning, these objectives have to be combined into a single reward function. In contrast, multi-objective reinforcement learning (MORL) methods learn from vectors of per-objective rewards instead. In the case of multi-policy MORL, sets of decision policies for various preferences regarding the conflicting objectives are optimized. This is especially important when target preferences are not known during training or when preferences change dynamically during application. While it is, in general, straightforward to extend a single-objective reinforcement learning method for MORL based on linear scalarization, solutions that are reachable by these methods are limited to convex regions of the Pareto front. Non-linear MORL methods like Thresholded Lexicographic Ordering (TLO) are designed to overcome this limitation. Generalized MORL methods utilize function approximation to generalize across objective preferences and thereby implicitly learn multiple policies in a data-efficient manner, even for complex decision problems with high-dimensional or continuous state spaces. In this work, we propose \textit{generalized Thresholded Lexicographic Ordering} (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL. We introduce a deep reinforcement learning realization of the algorithm and present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.

preprint2022arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.