Paper detail

Optimality Inductive Biases and Agnostic Guidelines for Offline Reinforcement Learning

The performance of state-of-the-art offline RL methods varies widely over the spectrum of dataset qualities, ranging from far-from-optimal random data to close-to-optimal expert demonstrations. We re-implement these methods to test their reproducibility, and show that when a given method outperforms the others on one end of the spectrum, it never does on the other end. This prevents us from naming a victor across the board. We attribute the asymmetry to the amount of inductive bias injected into the agent to entice it to posit that the behavior underlying the offline dataset is optimal for the task. Our investigations confirm that careless injections of such optimality inductive biases make dominant agents subpar as soon as the offline policy is sub-optimal. To bridge this gap, we generalize importance-weighted regression methods that have proved the most versatile across the spectrum of dataset grades into a modular framework that allows for the design of methods that align with how much we know about the dataset. This modularity enables qualitatively different injections of optimality inductive biases. We show that certain orchestrations strike the right balance, improving the return on one end of the spectrum without harming it on the other end. While the formulation of guidelines for the design of an offline method reduces to aligning the amount of optimality bias to inject with what we know about the quality of the data, the design of an agnostic method for which we need not know the quality of the data beforehand is more nuanced. Only our framework allowed us to design a method that performed well across the spectrum while remaining modular if more information about the quality of the data ever becomes available.

preprint2022arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.