Graph explorer

Neural Replicator Dynamics

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstationarity. By contrast, it is known that the replicator dynamics, a well-studied model from evolutionary game theory, eliminates dominated strategies and exhibits convergence of the time-averaged trajectories to interior Nash equilibria in zero-sum games. Thus, using the replicator dynamics as a foundation, we derive an elegant one-line change to policy gradient methods that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD). NeuRD reduces to the exponential weights/Hedge algorithm in the single-state all-actions case. Additionally, NeuRD has formal equivalence to softmax counterfactual regret minimization, which guarantees convergence in the sequential tabular case. Importantly, our algorithm provides a straightforward w

14 nodes15 linksoverview previewNeural Replicator Dynamics
14 nodes15 links
Neural Replicator Dynamics14 visible / 14 total nodes / 70 links
Related contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipAuthorshipWorks onAuthorshipAuthorshipAuthorshipTopic signalTopic signalAuthorshipAuthorshipAuthorshipAuthorshipAuthorshipAuthorshipAuthorshipWNeural Replicator Dynamicspreprint / 2020ADaniel HennesResearcherADustin MorrillResearcherAShayegan OmidshafieiResearcherARemi MunosResearcherTMachine Learning49008 worksTArtificial Intelligence22915 worksAJulien PerolatResearcherAMarc LanctotResearcherAAudrunas GruslysResearcherAJean-Baptiste LespiauResearcherAKarl TuylsResearcherAEdgar Duenez-GuzmanResearcherAPaavo ParmasResearcher
PaperSignal 1013 links

Neural Replicator Dynamics

preprint / 2020

Open