Researcher profile

Georgios Piliouras

Georgios Piliouras contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
30works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

30 published item(s)

preprint2026arXiv

Optimism Without Regularization: Constant Regret in Zero-Sum Games

This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded stepsize parameter -- obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of $Ω(\sqrt{T})$ for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving $o(\sqrt{T})$ regret.

preprint2026arXiv

When and Why is Optimistic Multiplicative Weights Slow? The Geometry of Energy Dissipation

This paper studies the convergence of the Optimistic Multiplicative Weights Update algorithm (OMWU) in two player zero-sum games. Recent works have identified instances on which the last-iterate of OMWU can converge arbitrarily slowly, but understanding when and why this slow convergence occurs has remained open. In this work, we develop a new analysis framework that gives sharp, quantitative explanations for this behavior. Our analysis is based on viewing the algorithm's dual iterates as an optimistic skew-gradient descent with respect to an energy function. We prove over the dual iterates that energy is dissipative, and by establishing tight bounds on the magnitude of dissipation, our analysis quantifies the geometric bottlenecks that arise when the corresponding primal iterates are close to the simplex boundary. This further translates into a new linear last-iterate convergence rate in KL divergence on games with a unique and interior Nash equilibrium. Compared to prior work, this new rate contains a much sharper dependence on game-specific constants, and we prove this dependence is optimal. Moreover, these geometric insights further translate into new separations on uniform convergence rates for OMWU. On the one hand, we prove constant lower bounds on the uniform best-iterate convergence rate in KL divergence and total variation distance from Nash. On the other hand, we establish for the $2\times 2$ setting a new ${\widetilde O}(T^{-1/2})$ best-iterate rate in duality gap, improving substantially over prior work. Together, this shows in general that uniform convergence rate guarantees do not transfer across different measures of distance to Nash.

preprint2023arXiv

Heterogeneous Beliefs and Multi-Population Learning in Network Games

The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP).We show that the system state -- a probability distribution over beliefs -- evolves according to a system of partial differential equations akin to the continuity equations that commonly desccribe transport phenomena in physical systems. We establish the convergence of SFP to Quantal Response Equilibria in different classes of games capturing both network competition as well as network coordination. We also prove that the beliefs will eventually homogenize in all network games. Although the initial belief heterogeneity disappears in the limit, we show that it plays a crucial role for equilibrium selection in the case of coordination games as it helps select highly desirable equilibria. Contrary, in the case of network competition, the resulting limit behavior is independent of the initialization of beliefs, even when the underlying game has many distinct Nash equilibria.

preprint2023arXiv

Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps

In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accuracy $ε$ with only $O(\log 1/ε)$ additional gradient-calls through the iterations of a contraction map. Then combining the analysis of (PP) method with an error-propagation analysis we establish that the resulting first order method, called Clairvoyant Extra Gradient, admits near-optimal time-average convergence for general domains and last-iterate convergence in the unconstrained case.

preprint2022arXiv

Alternating Mirror Descent for Constrained Min-Max Games

In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020).

preprint2022arXiv

Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update

In this paper, we provide a novel and simple algorithm, Clairvoyant Multiplicative Weights Updates (CMWU) for regret minimization in general games. CMWU effectively corresponds to the standard MWU algorithm but where all agents, when updating their mixed strategies, use the payoff profiles based on tomorrow's behavior, i.e. the agents are clairvoyant. CMWU achieves constant regret of $\ln(m)/η$ in all normal-form games with m actions and fixed step-sizes $η$. Although CMWU encodes in its definition a fixed point computation, which in principle could result in dynamics that are neither computationally efficient nor uncoupled, we show that both of these issues can be largely circumvented. Specifically, as long as the step-size $η$ is upper bounded by $\frac{1}{(n-1)V}$, where $n$ is the number of agents and $[0,V]$ is the payoff range, then the CMWU updates can be computed linearly fast via a contraction map. This implementation results in an uncoupled online learning dynamic that admits a $O (\log T)$-sparse sub-sequence where each agent experiences at most $O(nV\log m)$ regret. This implies that the CMWU dynamics converge with rate $O(nV \log m \log T / T)$ to a \textit{Coarse Correlated Equilibrium}. The latter improves on the current state-of-the-art convergence rate of \textit{uncoupled online learning dynamics} \cite{daskalakis2021near,anagnostides2021near}.

preprint2022arXiv

Data-Driven Models of Selfish Routing: Why Price of Anarchy Does Depend on Network Topology

We investigate traffic routing both from the perspective of theory as well as real world data. First, we introduce a new type of games: $θ$-free flow games. Here, commuters only consider, in their strategy sets, paths whose free-flow costs (informally their lengths) are within a small multiplicative $(1+θ)$ constant of the optimal free-flow cost path connecting their source and destination, where $θ\geq0$. We provide an exhaustive analysis of tight bounds on PoA($θ$) for arbitrary classes of cost functions, both in the case of general congestion/routing games as well as in the special case of path-disjoint networks. Second, by using a large mobility dataset in Singapore, we inspect minute-by-minute decision-making of thousands of commuters, and find that $θ=1$ is a good estimate of agents' route (pre)selection mechanism. In contrast, in Pigou networks, the ratio of the free-flow costs of the routes, and thus $θ$, is \textit{infinite}; so, although such worst case networks are mathematically simple, they correspond to artificial routing scenarios with little resemblance to real world conditions, opening the possibility of proving much stronger Price of Anarchy guarantees by explicitly studying their dependency on $θ$. For example, in the case of the standard Bureau of Public Roads (BPR) cost model, where$c_e(x)= a_e x^4+b_e$, and for quartic cost functions in general, the standard PoA bound for $θ=\infty$ is $2.1505$, and this is tight both for general networks as well as path-disjoint and even parallel-edge networks. In comparison, for $θ=1$, the PoA in the case of general networks is only $1.6994$, whereas for path-disjoint/parallel-edge networks is even smaller ($1.3652$), showing that both the route geometries as captured by the parameter $θ$ as well as the network topology have significant effects on PoA.

preprint2022arXiv

Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum property of agent payoffs, the efficient representation of graphical games as well the expressive power of EFGs. We examine the convergence properties of Optimistic Gradient Ascent (OGA) in these games. We prove that the time-average behavior of such online learning dynamics exhibits $O(1/T)$ rate convergence to the set of Nash Equilibria. Moreover, we show that the day-to-day behavior also converges to Nash with rate $O(c^{-t})$ for some game-dependent constant $c>0$.

preprint2022arXiv

Learning Correlated Equilibria in Mean-Field Games

The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.

preprint2022arXiv

Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.

preprint2022arXiv

Multi-agent Performative Prediction: From Global Stability and Optimality to Chaos

The recent framework of performative prediction is aimed at capturing settings where predictions influence the target/outcome they want to predict. In this paper, we introduce a natural multi-agent version of this framework, where multiple decision makers try to predict the same outcome. We showcase that such competition can result in interesting phenomena by proving the possibility of phase transitions from stability to instability and eventually chaos. Specifically, we present settings of multi-agent performative prediction where under sufficient conditions their dynamics lead to global stability and optimality. In the opposite direction, when the agents are not sufficiently cautious in their learning/updates rates, we show that instability and in fact formal chaos is possible. We complement our theoretical predictions with simulations showcasing the predictive power of our results.

preprint2022arXiv

Nash, Conley, and Computation: Impossibility and Incompleteness in Game Dynamics

Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players' behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular the concepts of chain recurrence, attractors, and Conley index, to prove a general impossibility result: there exist games for which any dynamics is bound to have starting points that do not end up at a Nash equilibrium. We also prove a stronger result for $ε$-approximate Nash equilibria: there are games such that no game dynamics can converge (in an appropriate sense) to $ε$-Nash equilibria, and in fact the set of such games has positive measure. Further numerical results demonstrate that this holds for any $ε$ between zero and $0.09$. Our results establish that, although the notions of Nash equilibria (and its computation-inspired approximations) are universally applicable in all games, they are also fundamentally incomplete as predictors of long term behavior, regardless of the choice of dynamics.

preprint2022arXiv

No-Regret Learning in Games is Turing Complete

Games are natural models for multi-agent machine learning settings, such as generative adversarial networks (GANs). The desirable outcomes from algorithmic interactions in these games are encoded as game theoretic equilibrium concepts, e.g. Nash and coarse correlated equilibria. As directly computing an equilibrium is typically impractical, one often aims to design learning algorithms that iteratively converge to equilibria. A growing body of negative results casts doubt on this goal, from non-convergence to chaotic and even arbitrary behaviour. In this paper we add a strong negative result to this list: learning in games is Turing complete. Specifically, we prove Turing completeness of the replicator dynamic on matrix games, one of the simplest possible settings. Our results imply the undecicability of reachability problems for learning algorithms in games, a special case of which is determining equilibrium convergence.

preprint2022arXiv

Poincaré-Bendixson Limit Sets in Multi-Agent Learning

A key challenge of evolutionary game theory and multi-agent learning is to characterize the limit behavior of game dynamics. Whereas convergence is often a property of learning algorithms in games satisfying a particular reward structure (e.g., zero-sum games), even basic learning models, such as the replicator dynamics, are not guaranteed to converge for general payoffs. Worse yet, chaotic behavior is possible even in rather simple games, such as variants of the Rock-Paper-Scissors game. Although chaotic behavior in learning dynamics can be precluded by the celebrated Poincaré-Bendixson theorem, it is only applicable to low-dimensional settings. Are there other characteristics of a game that can force regularity in the limit sets of learning? We show that behavior consistent with the Poincaré-Bendixson theorem (limit cycles, but no chaotic attractor) can follow purely from the topological structure of the interaction graph, even for high-dimensional settings with an arbitrary number of players and arbitrary payoff matrices. We prove our result for a wide class of follow-the-regularized leader (FoReL) dynamics, which generalize replicator dynamics, for binary games characterized interaction graphs where the payoffs of each player are only affected by one other player (i.e., interaction graphs of indegree one). Since chaos occurs already in games with only two players and three strategies, this class of non-chaotic games may be considered maximal. Moreover, we provide simple conditions under which such behavior translates into efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a Nash equilibrium, thereby connecting the topology of the dynamics to social-welfare analysis.

preprint2022arXiv

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.

preprint2022arXiv

Transaction Fees on a Honeymoon: Ethereum's EIP-1559 One Month Later

Ethereum Improvement Proposal (EIP) 1559 was recently implemented to transform Ethereum's transaction fee market. EIP-1559 utilizes an algorithmic update rule with a constant learning rate to estimate a base fee. The base fee reflects prevailing network conditions and hence provides a more reliable oracle for current gas prices. Using on-chain data from the period after its launch, we evaluate the impact of EIP-1559 on the user experience and market performance. Our empirical findings suggest that although EIP-1559 achieves its goals on average, short-term behavior is marked by intense, chaotic oscillations in block sizes (as predicted by our recent theoretical dynamical system analysis [1]) and slow adjustments during periods of demand bursts (e.g., NFT drops). Both phenomena lead to unwanted inter-block variability in mining rewards. To address this issue, we propose an alternative base fee adjustment rule in which the learning rate varies according to an additive increase, multiplicative decrease (AIMD) update scheme. Our simulations show that the latter robustly outperforms the EIP-1559 protocol under various demand scenarios. These results provide evidence that variable learning rate mechanisms may constitute a promising alternative to the default EIP-1559-based format and contribute to the ongoing discussion on the design of more efficient transaction fee markets.

preprint2022arXiv

Unpredictable dynamics in congestion games: memory loss can prevent chaos

We study the dynamics of simple congestion games with two resources where a continuum of agents behaves according to a version of Experience-Weighted Attraction (EWA) algorithm. The dynamics is characterized by two parameters: the (population) intensity of choice $a>0$ capturing the economic rationality of the total population of agents and a discount factor $σ\in [0,1]$ capturing a type of memory loss where past outcomes matter exponentially less than the recent ones. Finally, our system adds a third parameter $b \in (0,1)$, which captures the asymmetry of the cost functions of the two resources. It is the proportion of the agents using the first resource at Nash equilibrium, with $b=1/2$ capturing a symmetric network. Within this simple framework, we show a plethora of bifurcation phenomena where behavioral dynamics destabilize from global convergence to equilibrium, to limit cycles or even (formally proven) chaos as a function of the parameters $a$, $b$ and $σ$. Specifically, we show that for any discount factor $σ$ the system will be destabilized for a sufficiently large intensity of choice $a$. Although for discount factor $σ=0$ almost always (i.e., $b \neq 1/2$) the system will become chaotic, as $σ$ increases the chaotic regime will give place to the attracting periodic orbit of period 2. Therefore, memory loss can simplify game dynamics and make the system predictable. We complement our theoretical analysis with simulations and several bifurcation diagrams that showcase the unyielding complexity of the population dynamics (e.g., attracting periodic orbits of different lengths) even in the simplest possible potential games.

preprint2021arXiv

Follow-the-Regularized-Leader Routes to Chaos in Routing Games

We study the emergence of chaotic behavior of Follow-the-Regularized Leader (FoReL) dynamics in games. We focus on the effects of increasing the population size or the scale of costs in congestion games, and generalize recent results on unstable, chaotic behaviors in the Multiplicative Weights Update dynamics to a much larger class of FoReL dynamics. We establish that, even in simple linear non-atomic congestion games with two parallel links and any fixed learning rate, unless the game is fully symmetric, increasing the population size or the scale of costs causes learning dynamics to become unstable and eventually chaotic, in the sense of Li-Yorke and positive topological entropy. Furthermore, we show the existence of novel non-standard phenomena such as the coexistence of stable Nash equilibria and chaos in the same game. We also observe the simultaneous creation of a chaotic attractor as another chaotic attractor gets destroyed. Lastly, although FoReL dynamics can be strange and non-equilibrating, we prove that the time average still converges to an exact equilibrium for any choice of learning rate and any scale of costs.

preprint2021arXiv

Learning in Matrix Games can be Arbitrarily Complex

A growing number of machine learning architectures, such as Generative Adversarial Networks, rely on the design of games which implement a desired functionality via a Nash equilibrium. In practice these games have an implicit complexity (e.g. from underlying datasets and the deep networks used) that makes directly computing a Nash equilibrium impractical or impossible. For this reason, numerous learning algorithms have been developed with the goal of iteratively converging to a Nash equilibrium. Unfortunately, the dynamics generated by the learning process can be very intricate and instances of training failure hard to interpret. In this paper we show that, in a strong sense, this dynamic complexity is inherent to games. Specifically, we prove that replicator dynamics, the continuous-time analogue of Multiplicative Weights Update, even when applied in a very restricted class of games -- known as finite matrix games -- is rich enough to be able to approximate arbitrary dynamical systems. Our results are positive in the sense that they show the nearly boundless dynamic modelling capabilities of current machine learning practices, but also negative in implying that these capabilities may come at the cost of interpretability. As a concrete example, we show how replicator dynamics can effectively reproduce the well-known strange attractor of Lonrenz dynamics (the "butterfly effect") while achieving no regret.

preprint2021arXiv

Scaling up Mean Field Games with Online Mirror Descent

We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.

preprint2021arXiv

Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent

Many recent AI architectures are inspired by zero-sum games, however, the behavior of their dynamics is still not well understood. Inspired by this, we study standard gradient descent ascent (GDA) dynamics in a specific class of non-convex non-concave zero-sum games, that we call hidden zero-sum games. In this class, players control the inputs of smooth but possibly non-linear functions whose outputs are being applied as inputs to a convex-concave game. Unlike general zero-sum games, these games have a well-defined notion of solution; outcomes that implement the von-Neumann equilibrium of the "hidden" convex-concave game. We prove that if the hidden game is strictly convex-concave then vanilla GDA converges not merely to local Nash, but typically to the von-Neumann solution. If the game lacks strict convexity properties, GDA may fail to converge to any equilibrium, however, by applying standard regularization techniques we can prove convergence to a von-Neumann solution of a slightly perturbed zero-sum game. Our convergence guarantees are non-local, which as far as we know is a first-of-its-kind type of result in non-convex non-concave games. Finally, we discuss connections of our framework with generative adversarial networks.

preprint2020arXiv

Broken Detailed Balance and Non-Equilibrium Dynamics in Noisy Social Learning Models

We propose new Degroot-type social learning models with feedback in a continuous time, to investigate the effect of a noisy information source on consensus formation in a social network. Unlike the standard Degroot framework, noisy information models destroy consensus formation. On the other hand, the noisy opinion dynamics converge to the equilibrium distribution that encapsulates correlations among agents' opinions. Interestingly, such an equilibrium distribution is also a non-equilibrium steady state (NESS) with a non-zero probabilistic current loop. Thus, noisy information source leads to a NESS at long times that encodes persistent correlated opinion dynamics of learning agents. Our model provides a simple realization of NESS in the context of social learning. Other phenomena such as synchronization of opinions when agents are subject to a common noise are also studied.

preprint2020arXiv

Catastrophe by Design in Population Games: Destabilizing Wasteful Locked-in Technologies

In multi-agent environments in which coordination is desirable, the history of play often causes lock-in at sub-optimal outcomes. Notoriously, technologies with a significant environmental footprint or high social cost persist despite the successful development of more environmentally friendly and/or socially efficient alternatives. The displacement of the status quo is hindered by entrenched economic interests and network effects. To exacerbate matters, the standard mechanism design approaches based on centralized authorities with the capacity to use preferential subsidies to effectively dictate system outcomes are not always applicable to modern decentralized economies. What other types of mechanisms are feasible? In this paper, we develop and analyze a mechanism that induces transitions from inefficient lock-ins to superior alternatives. This mechanism does not exogenously favor one option over another -- instead, the phase transition emerges endogenously via a standard evolutionary learning model, Q-learning, where agents trade-off exploration and exploitation. Exerting the same transient influence to both the efficient and inefficient technologies encourages exploration and results in irreversible phase transitions and permanent stabilization of the efficient one. On a technical level, our work is based on bifurcation and catastrophe theory, a branch of mathematics that deals with changes in the number and stability properties of equilibria. Critically, our analysis is shown to be structurally robust to significant and even adversarially chosen perturbations to the parameters of both our game and our behavioral model.

preprint2020arXiv

Chaos, Extremism and Optimism: Volume Analysis of Learning in Games

We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games. Such analyses provide new insights into these game dynamical systems, which seem hard to achieve via the classical techniques within Computer Science and Machine Learning. The first step is to examine these dynamics not in their original space (simplex of actions) but in a dual space (aggregate payoff space of actions). The second step is to explore how the volume of a set of initial conditions evolves over time when it is pushed forward according to the algorithm. This is reminiscent of approaches in Evolutionary Game Theory where replicator dynamics, the continuous-time analogue of MWU, is known to always preserve volume in all games. Interestingly, when we examine discrete-time dynamics, both the choice of the game and the choice of the algorithm play a critical role. So whereas MWU expands volume in zero-sum games and is thus Lyapunov chaotic, we show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior. However, we also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts. Using these tools, we prove two novel, rather negative properties of MWU in zero-sum games: (1) Extremism: even in games with unique fully mixed Nash equilibrium, the system recurrently gets stuck near pure-strategy profiles, despite them being clearly unstable from game theoretic perspective. (2) Unavoidability: given any set of good points (with your own interpretation of "good"), the system cannot avoid bad points indefinitely.

preprint2020arXiv

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincaré recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergence guarantees in monotone games. We continue by showing how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium. Finally, we show how these insights can be directly used to build state-of-the-art model-free algorithms for zero-sum two-player Imperfect Information Games (IIG).

preprint2020arXiv

Learning Agents in Black-Scholes Financial Markets: Consensus Dynamics and Volatility Smiles

Black-Scholes (BS) is the standard mathematical model for option pricing in financial markets. Option prices are calculated using an analytical formula whose main inputs are strike (at which price to exercise) and volatility. The BS framework assumes that volatility remains constant across all strikes, however, in practice it varies. How do traders come to learn these parameters? We introduce natural models of learning agents, in which they update their beliefs about the true implied volatility based on the opinions of other traders. We prove convergence of these opinion dynamics using techniques from control theory and leader-follower models, thus providing a resolution between theory and market practices. We allow for two different models, one with feedback and one with an unknown leader.

preprint2020arXiv

Multiagent Evaluation under Incomplete Information

This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called α-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or α-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.

preprint2020arXiv

Multiplicative Weights Update as a Distributed Constrained Optimization Algorithm: Convergence to Second-order Stationary Points Almost Always

Non-concave maximization has been the subject of much recent study in the optimization and machine learning communities, specifically in deep learning. Recent papers Ge et al, Lee et al (and references therein) indicate that first order methods work well and avoid saddle points. Results as in Lee et al, however, are limited to the \textit{unconstrained} case or for cases where the critical points are in the interior of the feasibility set, which fail to capture some of the most interesting applications. In this paper we focus on \textit{constrained} non-concave maximization. We analyze a variant of a well-established algorithm in machine learning called Multiplicative Weights Update (MWU) for the maximization problem $\max_{\mathbf{x} \in D} P(\mathbf{x})$, where $P$ is non-concave, twice continuously differentiable and $D$ is a product of simplices. We show that MWU converges almost always for small enough stepsizes to critical points that satisfy the second order KKT conditions. We combine techniques from dynamical systems as well as taking advantage of a recent connection between Baum Eagon inequality and MWU (Palaiopanos et al).

preprint2020arXiv

Smooth markets: A basic mechanism for organizing gradient-based learners

With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.

preprint2019arXiv

The route to chaos in routing games: When is Price of Anarchy too optimistic?

Routing games are amongst the most studied classes of games. Their two most well-known properties are that learning dynamics converge to equilibria and that all equilibria are approximately optimal. In this work, we perform a stress test for these classic results by studying the ubiquitous dynamics, Multiplicative Weights Update, in different classes of congestion games, uncovering intricate non-equilibrium phenomena. As the system demand increases, the learning dynamics go through period-doubling bifurcations, leading to instabilities, chaos and large inefficiencies even in the simplest case of non-atomic routing games with two paths of linear cost where the Price of Anarchy is equal to one. Starting with this simple class, we show that every system has a carrying capacity, above which it becomes unstable. If the equilibrium flow is a symmetric $50-50\%$ split, the system exhibits one period-doubling bifurcation. A single periodic attractor of period two replaces the attracting fixed point. Although the Price of Anarchy is equal to one, in the large population limit the time-average social cost for all but a zero measure set of initial conditions converges to its worst possible value. For asymmetric equilibrium flows, increasing the demand eventually forces the system into Li-Yorke chaos with positive topological entropy and periodic orbits of all possible periods. Remarkably, in all non-equilibrating regimes, the time-average flows on the paths converge exactly to the equilibrium flows, a property akin to no-regret learning in zero-sum games. These results are robust. We extend them to routing games with arbitrarily many strategies, polynomial cost functions, non-atomic as well as atomic routing games and heteregenous users. Our results are also applicable to any sequence of shrinking learning rates, e.g., $1/\sqrt{T}$, by allowing for a dynamically increasing population size.