Graph explorer

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.

9 nodes13 linksoverview previewCompetitive Policy Optimization
9 nodes13 links
Competitive Policy Optimization9 visible / 9 total nodes / 23 links
Related contextRelated contextRelated contextCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipCo-authorshipAuthorshipWorks onWorks onAuthorshipAuthorshipAuthorshipTopic signalTopic signalTopic signalAuthorshipWCompetitive Policy Optimizationpreprint / 2020AManish PrajapatResearcherAKamyar AzizzadenesheliResearcherAAlexander LinigerResearcherAYisong YueResearcherTMachine Learning49008 worksTMultiagent Systems1840 worksTComputer Science and Ga...1864 worksAAnima AnandkumarResearcher
PaperSignal 108 links

Competitive Policy Optimization

preprint / 2020

Open