Graph explorer

Visualizing MuZero Models

MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value equivalent models have shown strong empirical success, there is no research yet that visualizes and investigates what types of representations these models actually learn. Therefore, in this paper we visualize the latent representation of MuZero agents. We find that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning. Based on this insight, we propose two regularization techniques to stabilize MuZero's performance. Additionally, we provide an open-source implementation of MuZero along with an interactive visualizer of learned representations, which may aid further investigation of value equivalent algorithms.

7 nodes8 linksoverview previewVisualizing MuZero Models
7 nodes8 links
Visualizing MuZero Models7 visible / 7 total nodes / 14 links
Related contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipAuthorshipWorks onAuthorshipAuthorshipAuthorshipTopic signalTopic signalWVisualizing MuZero Modelspreprint / 2021AJoery A. de VriesResearcherAKen S. VoskuilResearcherAThomas M. MoerlandResearcherAAske PlaatResearcherTMachine Learning49008 worksTArtificial Intelligence22915 works
PaperSignal 106 links

Visualizing MuZero Models

preprint / 2021

Open