Graph explorer

Minimax Model Learning

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

7 nodes13 linksoverview previewMinimax Model Learning
7 nodes13 links
Minimax Model Learning7 visible / 7 total nodes / 16 links
Related contextRelated contextRelated contextWorks onWorks onCo-authorshipCo-authorshipCo-authorshipAuthorshipWorks onWorks onAuthorshipAuthorshipTopic signalTopic signalTopic signalWMinimax Model Learningpreprint / 2021ACameron VoloshinResearcherANan JiangResearcherAYisong YueResearcherTMachine Learning49008 worksTArtificial Intelligence22915 worksTRobotics7585 works
PaperSignal 106 links

Minimax Model Learning

preprint / 2021

Open