Source author record

Rahul Savani

Rahul Savani appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Machine Learning Computational Complexity Artificial Intelligence Data Structures and Algorithms Computer Vision Cryptography and Security cs.CY Formal Languages and Automata Theory Logic in Computer Science q-fin.CP q-fin.PM q-fin.TR

Catalog footprint

What is connected

23works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Differential Privacy in the Extensive-Form Bandit Problem

We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies $ε$-local differential privacy and attains a regret of $\tilde{O}(\sqrt{A\ln(S)T}/ε)$, where $A$ is the total number of actions that the learner can possibly take, $S$ is the number of the learner's possible reduced strategies, and $T$ is the number of trials. On each trial, the time complexity of our algorithm is, up to a factor logarithmic in the maximum number of actions at an infoset, equal to the time required for the server to transmit the reduced strategy to the user. We note that local differential privacy is the strongest version of differential privacy and, to the best of our knowledge, this is the first work to study differential privacy of any form in the extensive-form bandit problem.

preprint2022arXiv

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule's coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt.

preprint2022arXiv

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by McKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms.

preprint2021arXiv

A deep learning approach to identify unhealthy advertisements in street view images

While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort. To address this problem we develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street-level images. We introduce the Liverpool 360 Street View (LIV360SV) dataset for evaluating our workflow. The dataset contains 25,349, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th - 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband). We find evidence of social inequalities with a larger proportion of food advertisements located within deprived areas and those frequented by students. Our project presents a novel implementation for the incidental classification of street view images for identifying unhealthy advertisements, providing a means through which to identify areas that can benefit from tougher advertisement restriction policies for tackling social inequalities.

preprint2020arXiv

A Natural Actor-Critic Algorithm with Downside Risk Constraints

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a recent actor-critic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.

preprint2020arXiv

One-Clock Priced Timed Games are PSPACE-hard

The main result of this paper is that computing the value of a one-clock priced timed game (OCPTG) is PSPACE-hard. Along the way, we provide a family of OCPTGs that have an exponential number of event points. Both results hold even in very restricted classes of games such as DAGs with treewidth three. Finally, we provide a number of positive results, including polynomial-time algorithms for even more restricted classes of OCPTGs such as trees.

preprint2020arXiv

Robust Market Making via Adversarial Reinforcement Learning

We show that adversarial reinforcement learning (ARL) can be used to produce market marking agents that are robust to adversarial and adaptively-chosen market conditions. To apply ARL, we turn the well-studied single-agent model of Avellaneda and Stoikov [2008] into a discrete-time zero-sum game between a market maker and adversary. The adversary acts as a proxy for other market participants that would like to profit at the market maker's expense. We empirically compare two conventional single-agent RL agents with ARL, and show that our ARL approach leads to: 1) the emergence of risk-averse behaviour without constraints or domain-specific penalties; 2) significant improvements in performance across a set of standard metrics, evaluated with or without an adversary in the test environment, and; 3) improved robustness to model uncertainty. We empirically demonstrate that our ARL method consistently converges, and we prove for several special cases that the profiles that we converge to correspond to Nash equilibria in a simplified single-stage game.

preprint2020arXiv

The Automated Inspection of Opaque Liquid Vaccines

In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial's surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0.94 and 0.93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217,888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset.

preprint2020arXiv

Tree Polymatrix Games are PPAD-hard

We prove that it is PPAD-hard to compute a Nash equilibrium in a tree polymatrix game with twenty actions per player. This is the first PPAD hardness result for a game with a constant number of actions per player where the interaction graph is acyclic. Along the way we show PPAD-hardness for finding an $ε$-fixed point of a 2D LinearFIXP instance, when $ε$ is any constant less than $(\sqrt{2} - 1)/2 \approx 0.2071$. This lifts the hardness regime from polynomially small approximations in $k$-dimensions to constant approximations in two-dimensions, and our constant is substantial when compared to the trivial upper bound of $0.5$.

preprint2016arXiv

An Empirical Study on Computing Equilibria in Polymatrix Games

The Nash equilibrium is an important benchmark for behaviour in systems of strategic autonomous agents. Polymatrix games are a succinct and expressive representation of multiplayer games that model pairwise interactions between players. The empirical performance of algorithms to solve these games has received little attention, despite their wide-ranging applications. In this paper we carry out a comprehensive empirical study of two prominent algorithms for computing a sample equilibrium in these games, Lemke's algorithm that computes an exact equilibrium, and a gradient descent method that computes an approximate equilibrium. Our study covers games arising from a number of interesting applications. We find that Lemke's algorithm can compute exact equilibria in relatively large games in a reasonable amount of time. If we are willing to accept (high-quality) approximate equilibria, then we can deal with much larger games using the descent method. We also report on which games are most challenging for each of the algorithms.

preprint2016arXiv

Unit Vector Games

McLennan and Tourky (2010) showed that "imitation games" provide a new view of the computation of Nash equilibria of bimatrix games with the Lemke-Howson algorithm. In an imitation game, the payoff matrix of one of the players is the identity matrix. We study the more general "unit vector games", which are already known, where the payoff matrix of one player is composed of unit vectors. Our main application is a simplification of the construction by Savani and von Stengel (2006) of bimatrix games where two basic equilibrium-finding algorithms take exponentially many steps: the Lemke-Howson algorithm, and support enumeration.

preprint2015arXiv

An Empirical Study of Finding Approximate Equilibria in Bimatrix Games

While there have been a number of studies about the efficacy of methods to find exact Nash equilibria in bimatrix games, there has been little empirical work on finding approximate Nash equilibria. Here we provide such a study that compares a number of approximation methods and exact methods. In particular, we explore the trade-off between the quality of approximate equilibrium and the required running time to find one. We found that the existing library GAMUT, which has been the de facto standard that has been used to test exact methods, is insufficient as a test bed for approximation methods since many of its games have pure equilibria or other easy-to-find good approximate equilibria. We extend the breadth and depth of our study by including new interesting families of bimatrix games, and studying bimatrix games upto size $2000 \times 2000$. Finally, we provide new close-to-worst-case examples for the best-performing algorithms for finding approximate Nash equilibria.

preprint2015arXiv

Computing stable outcomes in symmetric additively-separable hedonic games

We study the computational complexity of finding stable outcomes in hedonic games, which are a class of coalition formation games. We restrict our attention to symmetric additively-separable hedonic games, which are a nontrivial subclass of such games that are guaranteed to possess stable outcomes. These games are specified by an undirected edge- weighted graph: nodes are players, an outcome of the game is a partition of the nodes into coalitions, and the utility of a node is the sum of incident edge weights in the same coalition. We consider several stability requirements defined in the literature. These are based on restricting feasible player deviations, for example, by giving existing coalition members veto power. We extend these restrictions by considering more general forms of preference aggregation for coalition members. In particular, we consider voting schemes to decide if coalition members will allow a player to enter or leave their coalition. For all of the stability requirements we consider, the existence of a stable outcome is guaranteed by a potential function argument, and local improvements will converge to a stable outcome. We provide an almost complete characterization of these games in terms of the tractability of computing such stable outcomes. Our findings comprise positive results in the form of polynomial-time algorithms, and negative (PLS-completeness) results. The negative results extend to more general hedonic games.

preprint2014arXiv

Approximate Well-supported Nash Equilibria below Two-thirds

In an epsilon-Nash equilibrium, a player can gain at most epsilon by changing his behaviour. Recent work has addressed the question of how best to compute epsilon-Nash equilibria, and for what values of epsilon a polynomial-time algorithm exists. An epsilon-well-supported Nash equilibrium (epsilon-WSNE) has the additional requirement that any strategy that is used with non-zero probability by a player must have payoff at most epsilon less than the best response. A recent algorithm of Kontogiannis and Spirakis shows how to compute a 2/3-WSNE in polynomial time, for bimatrix games. Here we introduce a new technique that leads to an improvement to the worst-case approximation guarantee.

preprint2014arXiv

Computing Approximate Nash Equilibria in Polymatrix Games

In an $ε$-Nash equilibrium, a player can gain at most $ε$ by unilaterally changing his behaviour. For two-player (bimatrix) games with payoffs in $[0,1]$, the best-known$ε$ achievable in polynomial time is 0.3393. In general, for $n$-player games an $ε$-Nash equilibrium can be computed in polynomial time for an $ε$ that is an increasing function of $n$ but does not depend on the number of strategies of the players. For three-player and four-player games the corresponding values of $ε$ are 0.6022 and 0.7153, respectively. Polymatrix games are a restriction of general $n$-player games where a player's payoff is the sum of payoffs from a number of bimatrix games. There exists a very small but constant $ε$ such that computing an $ε$-Nash equilibrium of a polymatrix game is \PPAD-hard. Our main result is that a $(0.5+δ)$-Nash equilibrium of an $n$-player polymatrix game can be computed in time polynomial in the input size and $\frac{1}δ$. Inspired by the algorithm of Tsaknakis and Spirakis, our algorithm uses gradient descent on the maximum regret of the players. We also show that this algorithm can be applied to efficiently find a $(0.5+δ)$-Nash equilibrium in a two-player Bayesian game.

preprint2014arXiv

Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries

We study the deterministic and randomized query complexity of finding approximate equilibria in bimatrix games. We show that the deterministic query complexity of finding an $ε$-Nash equilibrium when $ε< \frac{1}{2}$ is $Ω(k^2)$, even in zero-one constant-sum games. In combination with previous results \cite{FGGS13}, this provides a complete characterization of the deterministic query complexity of approximate Nash equilibria. We also study randomized querying algorithms. We give a randomized algorithm for finding a $(\frac{3 - \sqrt{5}}{2} + ε)$-Nash equilibrium using $O(\frac{k \cdot \log k}{ε^2})$ payoff queries, which shows that the $\frac{1}{2}$ barrier for deterministic algorithms can be broken by randomization. For well-supported Nash equilibria (WSNE), we first give a randomized algorithm for finding an $ε$-WSNE of a zero-sum bimatrix game using $O(\frac{k \cdot \log k}{ε^4})$ payoff queries, and we then use this to obtain a randomized algorithm for finding a $(\frac{2}{3} + ε)$-WSNE in a general bimatrix game using $O(\frac{k \cdot \log k}{ε^4})$ payoff queries. Finally, we initiate the study of lower bounds against randomized algorithms in the context of bimatrix games, by showing that randomized algorithms require $Ω(k^2)$ payoff queries in order to find a $\frac{1}{6k}$-Nash equilibrium, even in zero-one constant-sum games. In particular, this rules out query-efficient randomized algorithms for finding exact Nash equilibria.

preprint2014arXiv

Game Theory Explorer - Software for the Applied Game Theorist

This paper presents the "Game Theory Explorer" software tool to create and analyze games as models of strategic interaction. A game in extensive or strategic form is created and nicely displayed with a graphical user interface in a web browser. State-of-the-art algorithms then compute all Nash equilibria of the game after a mouseclick. In tutorial fashion, we present how the program is used, and the ideas behind its main algorithms. We report on experiences with the architecture of the software and its development as an open-source project.

preprint2014arXiv

Learning Equilibria of Games via Payoff Queries

A recent body of experimental literature has studied empirical game-theoretical analysis, in which we have partial knowledge of a game, consisting of observations of a subset of the pure-strategy profiles and their associated payoffs to players. The aim is to find an exact or approximate Nash equilibrium of the game, based on these observations. It is usually assumed that the strategy profiles may be chosen in an on-line manner by the algorithm. We study a corresponding computational learning model, and the query complexity of learning equilibria for various classes of games. We give basic results for bimatrix and graphical games. Our focus is on symmetric network congestion games. For directed acyclic networks, we can learn the cost functions (and hence compute an equilibrium) while querying just a small fraction of pure-strategy profiles. For the special case of parallel links, we have the stronger result that an equilibrium can be identified while only learning a small fraction of the cost values.

preprint2014arXiv

Polylogarithmic Supports are required for Approximate Well-Supported Nash Equilibria below 2/3

In an epsilon-approximate Nash equilibrium, a player can gain at most epsilon in expectation by unilateral deviation. An epsilon well-supported approximate Nash equilibrium has the stronger requirement that every pure strategy used with positive probability must have payoff within epsilon of the best response payoff. Daskalakis, Mehta and Papadimitriou conjectured that every win-lose bimatrix game has a 2/3-well-supported Nash equilibrium that uses supports of cardinality at most three. Indeed, they showed that such an equilibrium will exist subject to the correctness of a graph-theoretic conjecture. Regardless of the correctness of this conjecture, we show that the barrier of a 2/3 payoff guarantee cannot be broken with constant size supports; we construct win-lose games that require supports of cardinality at least Omega((log n)^(1/3)) in any epsilon-well supported equilibrium with epsilon < 2/3. The key tool in showing the validity of the construction is a proof of a bipartite digraph variant of the well-known Caccetta-Haggkvist conjecture. A probabilistic argument shows that there exist epsilon-well-supported equilibria with supports of cardinality O(log n/(epsilon^2)), for any epsilon> 0; thus, the polylogarithmic cardinality bound presented cannot be greatly improved. We also show that for any delta > 0, there exist win-lose games for which no pair of strategies with support sizes at most two is a (1-delta)-well-supported Nash equilibrium. In contrast, every bimatrix game with payoffs in [0,1] has a 1/2-approximate Nash equilibrium where the supports of the players have cardinality at most two.

preprint2014arXiv

The Complexity of the Simplex Method

The simplex method is a well-studied and widely-used pivoting method for solving linear programs. When Dantzig originally formulated the simplex method, he gave a natural pivot rule that pivots into the basis a variable with the most violated reduced cost. In their seminal work, Klee and Minty showed that this pivot rule takes exponential time in the worst case. We prove two main results on the simplex method. Firstly, we show that it is PSPACE-complete to find the solution that is computed by the simplex method using Dantzig's pivot rule. Secondly, we prove that deciding whether Dantzig's rule ever chooses a specific variable to enter the basis is PSPACE-complete. We use the known connection between Markov decision processes (MDPs) and linear programming, and an equivalence between Dantzig's pivot rule and a natural variant of policy iteration for average-reward MDPs. We construct MDPs and show PSPACE-completeness results for single-switch policy iteration, which in turn imply our main results for the simplex method.

preprint2011arXiv

On the Approximation Performance of Fictitious Play in Finite Games

We study the performance of Fictitious Play, when used as a heuristic for finding an approximate Nash equilibrium of a 2-player game. We exhibit a class of 2-player games having payoffs in the range [0,1] that show that Fictitious Play fails to find a solution having an additive approximation guarantee significantly better than 1/2. Our construction shows that for n times n games, in the worst case both players may perpetually have mixed strategies whose payoffs fall short of the best response by an additive quantity 1/2 - O(1/n^(1-delta)) for arbitrarily small delta. We also show an essentially matching upper bound of 1/2 - O(1/n).

preprint2011arXiv

The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions

We show that the widely used homotopy method for solving fixpoint problems, as well as the Harsanyi-Selten equilibrium selection process for games, are PSPACE-complete to implement. Extending our result for the Harsanyi-Selten process, we show that several other homotopy-based algorithms for finding equilibria of games are also PSPACE-complete to implement. A further application of our techniques yields the result that it is PSPACE-complete to compute any of the equilibria that could be found via the classical Lemke-Howson algorithm, a complexity-theoretic strengthening of the result in [Savani and von Stengel]. These results show that our techniques can be widely applied and suggest that the PSPACE-completeness of implementing homotopy methods is a general principle.

preprint2009arXiv

Linear Complementarity Algorithms for Infinite Games

The performance of two pivoting algorithms, due to Lemke and Cottle and Dantzig, is studied on linear complementarity problems (LCPs) that arise from infinite games, such as parity, average-reward, and discounted games. The algorithms have not been previously studied in the context of infinite games, and they offer alternatives to the classical strategy-improvement algorithms. The two algorithms are described purely in terms of discounted games, thus bypassing the reduction from the games to LCPs, and hence facilitating a better understanding of the algorithms when applied to games. A family of parity games is given, on which both algorithms run in exponential time, indicating that in the worst case they perform no better for parity, average-reward, or discounted games than they do for general P-matrix LCPs.

Rahul Savani

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Differential Privacy in the Extensive-Form Bandit Problem

Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

A deep learning approach to identify unhealthy advertisements in street view images

A Natural Actor-Critic Algorithm with Downside Risk Constraints

One-Clock Priced Timed Games are PSPACE-hard

Robust Market Making via Adversarial Reinforcement Learning

The Automated Inspection of Opaque Liquid Vaccines

Tree Polymatrix Games are PPAD-hard

An Empirical Study on Computing Equilibria in Polymatrix Games

Unit Vector Games

An Empirical Study of Finding Approximate Equilibria in Bimatrix Games

Computing stable outcomes in symmetric additively-separable hedonic games

Approximate Well-supported Nash Equilibria below Two-thirds

Computing Approximate Nash Equilibria in Polymatrix Games

Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries

Game Theory Explorer - Software for the Applied Game Theorist

Learning Equilibria of Games via Payoff Queries

Polylogarithmic Supports are required for Approximate Well-Supported Nash Equilibria below 2/3

The Complexity of the Simplex Method

On the Approximation Performance of Fictitious Play in Finite Games

The Complexity of the Homotopy Method, Equilibrium Selection, and Lemke-Howson Solutions

Linear Complementarity Algorithms for Infinite Games