Source author record

Tuomas Sandholm

Tuomas Sandholm appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Artificial Intelligence Machine Learning Multiagent Systems Data Structures and Algorithms math.OC Cryptography and Security econ.GN eess.SY q-fin.EC Systems and Control

Catalog footprint

What is connected

32works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Domain-Independent Game Abstraction using Word Embedding Techniques

Many games of interest in the real world are often intractably large, thereby necessitating the use of game abstraction to shrink them in size, typically by many magnitudes. Over the last two decades, there have been significant advances in game abstraction; however, the domain-specific nature (usually poker) of much of the prior work prevents those techniques from being easily generalized to other settings without extensively analyzing the game at hand. In this paper, we propose a domain-independent approach to game abstraction, which applies word embedding techniques from the field of natural language processing. Treating each action as a word and gameplay data as a corpus, word vectors can be trained to represent each action as a real-valued vector, which can then be clustered to facilitate game abstraction. We also explore the use of foundational embedding models and show that action embeddings obtained this way can capture a surprising amount of information about the underlying game. Experimental results demonstrate that our proposed game abstraction technique is effective, although it does not outperform specialized algorithms tailored to specific games.

preprint2026arXiv

Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques

How should an agent's performance in a multiagent environment be evaluated when there is a limited sample size or a high cost of running a trial? The AIVAT family of variance reduction techniques was proposed to address this challenge by introducing unbiased low-variance estimators of agents' expected payoffs. An important component of AIVAT is a heuristic value function that discriminates between potentially low- and high-value counterfactual histories. A notable gap in the literature is that there is little to no constraint or guideline on how the heuristic value function should be chosen or how uncertainty in its output should be handled. In our first contribution, we parameterize the heuristic value function to highlight AIVAT's potential vulnerabilities: a) the sample variance can be set pathologically low by directly applying gradient descent on the sample variance, and b) one can p-hack to draw a desired statistical conclusion via gradient descent/ascent on the test statistic. The main takeaway is that the heuristic value function should be fixed prior to observing the evaluation data! In our second contribution, we show how the heuristic uncertainty can be propagated to quantify the uncertainty of AIVAT estimates. It is then possible to further reduce the variance using inverse-variance weighted averaging, but AIVAT's unbiasedness guarantee may have to be sacrificed. In our experiments, we use a dataset of 10,000 poker hands to demonstrate our heuristic pathology and uncertainty results, with the latter yielding a 43.0% reduction in the number of samples (poker hands) needed to draw statistical conclusions.

preprint2026arXiv

Parallelizing Counterfactual Regret Minimization

Parallelization has played an instrumental role in the field of artificial intelligence (AI), drastically reducing the time taken to train and evaluate large AI models. In contrast to its impact in the broader field of AI, applying parallelization to computational game solving is relatively unexplored, despite its great potential. In this paper, we parallelize the family of counterfactual regret minimization (CFR) algorithms, which were central to important breakthroughs for solving large imperfect-information games. We present a generalized parallelization framework, reframing CFR as a series of linear algebra operations. Then, existing techniques for parallelizing linear algebra operations can be applied to accelerate CFR. We also describe how our technique can be applied to other tabular members of the CFR family of algorithms, including the state-of-the-art, such as CFR+, discounted CFR, and predictive variants of CFR. Experimentally, we show that our CFR implementation on a GPU is up to four orders of magnitude faster than Google DeepMind OpenSpiel's CFR implementations on a CPU.

preprint2026arXiv

Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games

Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.

preprint2022arXiv

Anytime PSRO for Two-Player Zero-Sum Games

Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO), a tabular double oracle algorithm for 2-player zero-sum games that is guaranteed to converge to a Nash equilibrium while decreasing exploitability from one iteration to the next. Unlike DO, in which the restricted distribution is based on the restricted game formed by each player's strategy sets, ADO finds the restricted distribution for each player that minimizes its exploitability against any policy in the full, unrestricted game. We also propose a method of finding this restricted distribution via a no-regret algorithm updated against best responses, called RM-BR DO. Finally, we propose anytime PSRO (APSRO), a version of ADO that calculates best responses via reinforcement learning. In experiments on Leduc poker and random normal form games, we show that our methods achieve far lower exploitability than DO and PSRO and decrease exploitability monotonically.

preprint2022arXiv

Differentiable Economics for Randomized Affine Maximizer Auctions

A recent approach to automated mechanism design, differentiable economics, represents auctions by rich function approximators and optimizes their performance by gradient descent. The ideal auction architecture for differentiable economics would be perfectly strategyproof, support multiple bidders and items, and be rich enough to represent the optimal (i.e. revenue-maximizing) mechanism. So far, such an architecture does not exist. There are single-bidder approaches (MenuNet, RochetNet) which are always strategyproof and can represent optimal mechanisms. RegretNet is multi-bidder and can approximate any mechanism, but is only approximately strategyproof. We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism. This architecture is the classic affine maximizer auction (AMA), modified to offer lotteries. By using the gradient-based optimization tools of differentiable economics, we can now train lottery AMAs, competing with or outperforming prior approaches in revenue.

preprint2022arXiv

Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria

A recent emerging trend in the literature on learning in games has been concerned with providing faster learning dynamics for correlated and coarse correlated equilibria in normal-form games. Much less is known about the significantly more challenging setting of extensive-form games, which can capture both sequential and simultaneous moves, as well as imperfect information. In this paper we establish faster no-regret learning dynamics for \textit{extensive-form correlated equilibria (EFCE)} in multiplayer general-sum imperfect-information extensive-form games. When all players follow our accelerated dynamics, the correlated distribution of play is an $O(T^{-3/4})$-approximate EFCE, where the $O(\cdot)$ notation suppresses parameters polynomial in the description of the game. This significantly improves over the best prior rate of $O(T^{-1/2})$. To achieve this, we develop a framework for performing accelerated \emph{Phi-regret minimization} via predictions. One of our key technical contributions -- that enables us to employ our generic template -- is to characterize the stability of fixed points associated with \emph{trigger deviation functions} through a refined perturbation analysis of a structured Markov chain. Furthermore, for the simpler solution concept of extensive-form \emph{coarse} correlated equilibrium (EFCCE) we give a new succinct closed-form characterization of the associated fixed points, bypassing the expensive computation of stationary distributions required for EFCE. Our results place EFCCE closer to \emph{normal-form coarse correlated equilibria} in terms of the per-iteration complexity, although the former prescribes a much more compelling notion of correlation. Finally, experiments conducted on standard benchmarks corroborate our theoretical findings.

preprint2022arXiv

Improved Sample Complexity Bounds for Branch-and-Cut

Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training set of integer programs from the application domain at hand, the goal is to find a configuration with strong predicted performance on future, unseen integer programs from the same domain. If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs. In this paper, we prove sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cutting plane selection, and are sharper and more general than those found in prior research.

preprint2022arXiv

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

preprint2022arXiv

Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts

The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.

preprint2022arXiv

Team Correlated Equilibria in Zero-Sum Extensive-Form Games via Tree Decompositions

Despite the many recent practical and theoretical breakthroughs in computational game theory, equilibrium finding in extensive-form team games remains a significant challenge. While NP-hard in the worst case, there are provably efficient algorithms for certain families of team game. In particular, if the game has common external information, also known as A-loss recall -- informally, actions played by non-team members (i.e., the opposing team or nature) are either unknown to the entire team, or common knowledge within the team -- then polynomial-time algorithms exist (Kaneko & Kline 1995). In this paper, we devise a completely new algorithm for solving team games. It uses a tree decomposition of the constraint system representing each team's strategy to reduce the number and degree of constraints required for correctness (tightness of the mathematical program). Our approach has the bags of the tree decomposition correspond to team-public states. Our algorithm reduces the problem of solving team games to a linear program with at most $O(NW^{w+1})$ nonzero entries in the constraint matrix, where $N$ is the size of the game tree, $w$ is a parameter that depends on the amount of uncommon external information, and $W$ is the treewidth of the tree decomposition. In public-action games, our program size is bounded by the tighter $2^{O(nt)}N$ for teams of $n$ players with $t$ types each. Our algorithm is based on a new way to write a custom, concise tree decomposition, and its fast run time does not assume that the decomposition has small treewidth. Since our algorithm describes the polytope of correlated strategies directly, we get equilibrium finding in correlated strategies for free -- instead of, say, having to run a double oracle algorithm. We show via experiments on a standard suite of games that our algorithm achieves state-of-the-art performance on all benchmark game classes except one.

preprint2021arXiv

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment. It captures the online decision-making problems that each player faces in an extensive-form game, as well as Markov decision processes and partially-observable Markov decision processes where the agent conditions on observed history. Over the past decade, there has been considerable effort into designing online optimization methods for TFSDM. Virtually all of that work has been in the full-feedback setting, where the agent has access to counterfactuals, that is, information on what would have happened had the agent chosen a different action at any decision node. Little is known about the bandit setting, where that assumption is reversed (no counterfactual information is available), despite this latter setting being well understood for almost 20 years in one-shot decision making. In this paper, we give the first algorithm for the bandit linear optimization problem for TFSDM that offers both (i) linear-time iterations (in the size of the decision tree) and (ii) $O(\sqrt{T})$ cumulative regret in expectation compared to any fixed strategy, at all times $T$. This is made possible by new results that we derive, which may have independent uses as well: 1) geometry of the dilated entropy regularizer, 2) autocorrelation matrix of the natural sampling scheme for sequence-form strategies, 3) construction of an unbiased estimator for linear losses for sequence-form strategies, and 4) a refined regret analysis for mirror descent when using the dilated entropy regularizer.

preprint2021arXiv

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

Blackwell approachability is a framework for reasoning about repeated games with vector-valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the next payoff vector is given, and the decision maker tries to achieve better performance based on the accuracy of that estimator. In order to derive algorithms that achieve predictive Blackwell approachability, we start by showing a powerful connection between four well-known algorithms. Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization. In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework). We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game. By applying the predictive variants of FTRL or OMD to this connection, we obtain predictive Blackwell approachability algorithms, as well as predictive variants of RM and RM+. In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games, sometimes by two or more orders of magnitude.

preprint2021arXiv

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only black-box access to the environment is available. We give the first, to our knowledge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i) -- and thus also (ii) -- is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encounters new decision points. We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algorithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multi-player games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment.

preprint2020arXiv

Efficient exploration of zero-sum stochastic games

We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. After that, the algorithm has to produce a strategy that has low exploitability. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly. For the stochastic game setting, we propose using the distribution of state-action value functions induced by a belief distribution over possible environments. We compare the performance of various exploration strategies for this task, including generalizations of Thompson sampling and Bayes-UCB to this new setting. These two consistently outperform other strategies.

preprint2020arXiv

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.

preprint2020arXiv

Limited Lookahead in Imperfect-Information Games

Limited lookahead has been studied for decades in perfect-information games. We initiate a new direction via two simultaneous deviation points: generalization to imperfect-information games and a game-theoretic approach. We study how one should act when facing an opponent whose lookahead is limited. We study this for opponents that differ based on their lookahead depth, based on whether they, too, have imperfect information, and based on how they break ties. We characterize the hardness of finding a Nash equilibrium or an optimal commitment strategy for either player, showing that in some of these variations the problem can be solved in polynomial time while in others it is PPAD-hard, NP-hard, or inapproximable. We proceed to design algorithms for computing optimal commitment strategies---for when the opponent breaks ties favorably, according to a fixed rule, or adversarially. We then experimentally investigate the impact of limited lookahead. The limited-lookahead player often obtains the value of the game if she knows the expected values of nodes in the game tree for some equilibrium---but we prove this is not sufficient in general. Finally, we study the impact of noise in those estimates and different lookahead depths.

preprint2020arXiv

Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

Unlike normal-form games, where correlated equilibria have been studied for more than 45 years, extensive-form correlation is still generally not well understood. Part of the reason for this gap is that the sequential nature of extensive-form games allows for a richness of behaviors and incentives that are not possible in normal-form settings. This richness translates to a significantly different complexity landscape surrounding extensive-form correlated equilibria. As of today, it is known that finding an optimal extensive-form correlated equilibrium (EFCE), extensive-form coarse correlated equilibrium (EFCCE), or normal-form coarse correlated equilibrium (NFCCE) in a two-player extensive-form game is computationally tractable when the game does not include chance moves, and intractable when the game involves chance moves. In this paper we significantly refine this complexity threshold by showing that, in two-player games, an optimal correlated equilibrium can be computed in polynomial time, provided that a certain condition is satisfied. We show that the condition holds, for example, when all chance moves are public, that is, both players observe all chance moves. This implies that an optimal EFCE, EFCCE and NFCCE can be computed in polynomial time in the game size in two-player games with public chance moves, providing the biggest positive complexity result surrounding extensive-form correlation in more than a decade.

preprint2020arXiv

Sparsified Linear Programming for Zero-Sum Equilibrium Finding

Computational equilibrium finding in large zero-sum extensive-form imperfect-information games has led to significant recent AI breakthroughs. The fastest algorithms for the problem are new forms of counterfactual regret minimization [Brown and Sandholm, 2019]. In this paper we present a totally different approach to the problem, which is competitive and often orders of magnitude better than the prior state of the art. The equilibrium-finding problem can be formulated as a linear program (LP) [Koller et al., 1994], but solving it as an LP has not been scalable due to the memory requirements of LP solvers, which can often be quadratically worse than CFR-based algorithms. We give an efficient practical algorithm that factors a large payoff matrix into a product of two matrices that are typically dramatically sparser. This allows us to express the equilibrium-finding problem as a linear program with size only a logarithmic factor worse than CFR, and thus allows linear program solvers to run on such games. With experiments on poker endgames, we demonstrate in practice, for the first time, that modern linear program solvers are competitive against even game-specific modern variants of CFR in solving large extensive-form games, and can be used to compute exact solutions unlike iterative algorithms like CFR.

preprint2020arXiv

Stochastic Regret Minimization in Extensive-Form Games

Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full tree traversals. It works by using gradient estimates that can be computed via sampling. However, stochastic methods for sequential games have not been investigated extensively beyond MCCFR. In this paper we develop a new framework for developing stochastic regret minimization methods. This framework allows us to use any regret-minimization algorithm, coupled with any gradient estimator. The MCCFR algorithm can be analyzed as a special case of our framework, and this analysis leads to significantly-stronger theoretical on convergence, while simultaneously yielding a simplified proof. Our framework allows us to instantiate several new stochastic methods for solving sequential games. We show extensive experiments on three games, where some variants of our methods outperform MCCFR.

preprint2016arXiv

Hardness of the Pricing Problem for Chains in Barter Exchanges

Kidney exchange is a barter market where patients trade willing but medically incompatible donors. These trades occur via cycles, where each patient-donor pair both gives and receives a kidney, and via chains, which begin with an altruistic donor who does not require a kidney in return. For logistical reasons, the maximum length of a cycle is typically limited to a small constant, while chains can be much longer. Given a compatibility graph of patient-donor pairs, altruists, and feasible potential transplants between them, finding even a maximum-cardinality set of vertex-disjoint cycles and chains is NP-hard. There has been much work on developing provably optimal solvers that are efficient in practice. One of the leading techniques has been branch and price, where column generation is used to incrementally bring cycles and chains into the optimization model on an as-needed basis. In particular, only positive-price columns need to be brought into the model. We prove that finding a positive-price chain is NP-complete. This shows incorrectness of two leading branch-and-price solvers that suggested polynomial-time chain pricing algorithms.

preprint2016arXiv

Imperfect-Recall Abstractions with Bounds in Games

Imperfect-recall abstraction has emerged as the leading paradigm for practical large-scale equilibrium computation in incomplete-information games. However, imperfect-recall abstractions are poorly understood, and only weak algorithm-specific guarantees on solution quality are known. In this paper, we show the first general, algorithm-agnostic, solution quality guarantees for Nash equilibria and approximate self-trembling equilibria computed in imperfect-recall abstractions, when implemented in the original (perfect-recall) game. Our results are for a class of games that generalizes the only previously known class of imperfect-recall abstractions where any results had been obtained. Further, our analysis is tighter in two ways, each of which can lead to an exponential reduction in the solution quality error bound. We then show that for extensive-form games that satisfy certain properties, the problem of computing a bound-minimizing abstraction for a single level of the game reduces to a clustering problem, where the increase in our bound is the distance function. This reduction leads to the first imperfect-recall abstraction algorithm with solution quality bounds. We proceed to show a divide in the class of abstraction problems. If payoffs are at the same scale at all information sets considered for abstraction, the input forms a metric space. Conversely, if this condition is not satisfied, we show that the input does not form a metric space. Finally, we use these results to experimentally investigate the quality of our bound for single-level abstraction.

preprint2016arXiv

Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning

Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games. Regret-Based Pruning (RBP) is an improvement that allows poorly-performing actions to be temporarily pruned, thus speeding up CFR. We introduce Total RBP, a new form of RBP that reduces the space requirements of CFR as actions are pruned. We prove that in zero-sum games it asymptotically prunes any action that is not part of a best response to some Nash equilibrium. This leads to provably faster convergence and lower space requirements. Experiments show that Total RBP results in an order of magnitude reduction in space, and the reduction factor increases with game size.

preprint2016arXiv

Sample Complexity of Automated Mechanism Design

The design of revenue-maximizing combinatorial auctions, i.e. multi-item auctions over bundles of goods, is one of the most fundamental problems in computational economics, unsolved even for two bidders and two items for sale. In the traditional economic models, it is assumed that the bidders' valuations are drawn from an underlying distribution and that the auction designer has perfect knowledge of this distribution. Despite this strong and oftentimes unrealistic assumption, it is remarkable that the revenue-maximizing combinatorial auction remains unknown. In recent years, automated mechanism design has emerged as one of the most practical and promising approaches to designing high-revenue combinatorial auctions. The most scalable automated mechanism design algorithms take as input samples from the bidders' valuation distribution and then search for a high-revenue auction in a rich auction class. In this work, we provide the first sample complexity analysis for the standard hierarchy of deterministic combinatorial auction classes used in automated mechanism design. In particular, we provide tight sample complexity bounds on the number of samples needed to guarantee that the empirical revenue of the designed mechanism on the samples is close to its expected revenue on the underlying, unknown distribution over bidder valuations, for each of the auction classes in the hierarchy. In addition to helping set automated mechanism design on firm foundations, our results also push the boundaries of learning theory. In particular, the hypothesis functions used in our contexts are defined through multi-stage combinatorial optimization procedures, rather than simple decision boundaries, as are common in machine learning.

preprint2016arXiv

Small Representations of Big Kidney Exchange Graphs

Kidney exchanges are organized markets where patients swap willing but incompatible donors. In the last decade, kidney exchanges grew from small and regional to large and national---and soon, international. This growth results in more lives saved, but exacerbates the empirical hardness of the $\mathcal{NP}$-complete problem of optimally matching patients to donors. State-of-the-art matching engines use integer programming techniques to clear fielded kidney exchanges, but these methods must be tailored to specific models and objective functions, and may fail to scale to larger exchanges. In this paper, we observe that if the kidney exchange compatibility graph can be encoded by a constant number of patient and donor attributes, the clearing problem is solvable in polynomial time. We give necessary and sufficient conditions for losslessly shrinking the representation of an arbitrary compatibility graph. Then, using real compatibility graphs from the UNOS nationwide kidney exchange, we show how many attributes are needed to encode real compatibility graphs. The experiments show that, indeed, small numbers of attributes suffice.

preprint2015arXiv

Ignorance is Almost Bliss: Near-Optimal Stochastic Matching With Few Queries

The stochastic matching problem deals with finding a maximum matching in a graph whose edges are unknown but can be accessed via queries. This is a special case of stochastic $k$-set packing, where the problem is to find a maximum packing of sets, each of which exists with some probability. In this paper, we provide edge and set query algorithms for these two problems, respectively, that provably achieve some fraction of the omniscient optimal solution. Our main theoretical result for the stochastic matching (i.e., $2$-set packing) problem is the design of an \emph{adaptive} algorithm that queries only a constant number of edges per vertex and achieves a $(1-ε)$ fraction of the omniscient optimal solution, for an arbitrarily small $ε>0$. Moreover, this adaptive algorithm performs the queries in only a constant number of rounds. We complement this result with a \emph{non-adaptive} (i.e., one round of queries) algorithm that achieves a $(0.5 - ε)$ fraction of the omniscient optimum. We also extend both our results to stochastic $k$-set packing by designing an adaptive algorithm that achieves a $(\frac{2}{k} - ε)$ fraction of the omniscient optimal solution, again with only $O(1)$ queries per element. This guarantee is close to the best known polynomial-time approximation ratio of $\frac{3}{k+1} -ε$ for the \emph{deterministic} $k$-set packing problem [Furer and Yu, 2013] We empirically explore the application of (adaptations of) these algorithms to the kidney exchange problem, where patients with end-stage renal failure swap willing but incompatible donors. We show on both generated data and on real data from the first 169 match runs of the UNOS nationwide kidney exchange that even a very small number of non-adaptive edge queries per vertex results in large gains in expected successful matches.

preprint2015arXiv

Strong Nash equilibria and mixed strategies

In this paper we consider strong Nash equilibria, in mixed strategies, for finite games. Any strong Nash equilibrium outcome is Pareto efficient for each coalition. First, we analyze the two--player setting. Our main result, in its simplest form, states that if a game has a strong Nash equilibrium with full support (that is, both players randomize among all pure strategies), then the game is strictly competitive. In order to get our result we use the indifference principle fulfilled by any Nash equilibrium, and the classical KKT conditions (in the vector setting), that are necessary conditions for Pareto efficiency. Our characterization enables us to design a strong-Nash-equilibrium-finding algorithm with complexity in Smoothed-$\mathcal{P}$. So, this problem---that Conitzer and Sandholm [Conitzer, V., Sandholm, T., 2008. New complexity results about Nash equilibria. Games Econ. Behav. 63, 621--641] proved to be computationally hard in the worst case---is generically easy. Hence, although the worst case complexity of finding a strong Nash equilibrium is harder than that of finding a Nash equilibrium, once small perturbations are applied, finding a strong Nash is easier than finding a Nash equilibrium. Next we switch to the setting with more than two players. We demonstrate that a strong Nash equilibrium can exist in which an outcome that is strictly Pareto dominated by a Nash equilibrium occurs with positive probability. Finally, we prove that games that have a strong Nash equilibrium where at least one player puts positive probability on at least two pure strategies are extremely rare: they are of zero measure.

preprint2014arXiv

Algorithms for Closed Under Rational Behavior (CURB) Sets

We provide a series of algorithms demonstrating that solutions according to the fundamental game-theoretic solution concept of closed under rational behavior (CURB) sets in two-player, normal-form games can be computed in polynomial time (we also discuss extensions to n-player games). First, we describe an algorithm that identifies all of a player's best responses conditioned on the belief that the other player will play from within a given subset of its strategy space. This algorithm serves as a subroutine in a series of polynomial-time algorithms for finding all minimal CURB sets, one minimal CURB set, and the smallest minimal CURB set in a game. We then show that the complexity of finding a Nash equilibrium can be exponential only in the size of a game's smallest CURB set. Related to this, we show that the smallest CURB set can be an arbitrarily small portion of the game, but it can also be arbitrarily larger than the supports of its only enclosed Nash equilibrium. We test our algorithms empirically and find that most commonly studied academic games tend to have either very large or very small minimal CURB sets.

preprint2014arXiv

Complexity of Mechanism Design

The aggregation of conflicting preferences is a central problem in multiagent systems. The key difficulty is that the agents may report their preferences insincerely. Mechanism design is the art of designing the rules of the game so that the agents are motivated to report their preferences truthfully and a (socially) desirable outcome is chosen. We propose an approach where a mechanism is automatically created for the preference aggregation setting at hand. This has several advantages, but the downside is that the mechanism design optimization problem needs to be solved anew each time. Focusing on settings where side payments are not possible, we show that the mechanism design problem is NP-complete for deterministic mechanisms. This holds both for dominant-strategy implementation and for Bayes-Nash implementation. We then show that if we allow randomized mechanisms, the mechanism design problem becomes tractable. In other words, the coordinator can tackle the computational complexity introduced by its uncertainty about the agents preferences BY making the agents face additional uncertainty.This comes at no loss, AND IN SOME cases at a gain, IN the(social) objective.

preprint2013arXiv

On the complexity of strong Nash equilibrium: Hard-to-solve instances and smoothed complexity

The computational characterization of game-theoretic solution concepts is a central topic in artificial intelligence, with the aim of developing computationally efficient tools for finding optimal ways to behave in strategic interactions. The central solution concept in game theory is Nash equilibrium (NE). However, it fails to capture the possibility that agents can form coalitions (even in the 2-agent case). Strong Nash equilibrium (SNE) refines NE to this setting. It is known that finding an SNE is NP-complete when the number of agents is constant. This hardness is solely due to the existence of mixed-strategy SNEs, given that the problem of enumerating all pure-strategy SNEs is trivially in P. Our central result is that, in order for a game to have at least one non-pure-strategy SNE, the agents' payoffs restricted to the agents' supports must, in the case of 2 agents, lie on the same line, and, in the case of n agents, lie on an (n - 1)-dimensional hyperplane. Leveraging this result, we provide two contributions. First, we develop worst-case instances for support-enumeration algorithms. These instances have only one SNE and the support size can be chosen to be of any size-in particular, arbitrarily large. Second, we prove that, unlike NE, finding an SNE is in smoothed polynomial time: generic game instances (i.e., all instances except knife-edge cases) have only pure-strategy SNEs.

preprint2012arXiv

Combining local search techniques and path following for bimatrix games

Computing a Nash equilibrium (NE) is a central task in computer science. An NE is a particularly appropriate solution concept for two-agent settings because coalitional deviations are not an issue. However, even in this case, finding an NE is PPAD-complete. In this paper, we combine path following algorithms with local search techniques to design new algorithms for finding exact and approximate NEs. We show that our algorithms largely outperform the state of the art and that almost all the known benchmark game classes are easily solvable or approximable (except for the GAMUT CovariantGameRand class).

preprint2012arXiv

Common Voting Rules as Maximum Likelihood Estimators

Voting is a very general method of preference aggregation. A voting rule takes as input every voter's vote (typically, a ranking of the alternatives), and produces as output either just the winning alternative or a ranking of the alternatives. One potential view of voting is the following. There exists a 'correct' outcome (winner/ranking), and each voter's vote corresponds to a noisy perception of this correct outcome. If we are given the noise model, then for any vector of votes, we can

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.14283:author:3:tuomas-sandholm

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.14277:author:2:tuomas-sandholm

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.14261:author:2:tuomas-sandholm

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.15543:author:2:tuomas-sandholm

Imported May 20, 2026Synced May 20, 2026

7 works

Gabriele Farina

Researcher

Gabriele Farina contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Christian Kroer

Researcher

Christian Kroer contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Juho Kim

Researcher

Juho Kim contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Nicola Gatti

Researcher

Nicola Gatti contributes to research discovery and scholarly infrastructure.

Open to collaborate

Tuomas Sandholm

What is connected

Connect this record

See the researcher in context

Building this map preview

32 published item(s)

Domain-Independent Game Abstraction using Word Embedding Techniques

Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques

Parallelizing Counterfactual Regret Minimization

Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games

Anytime PSRO for Two-Player Zero-Sum Games

Differentiable Economics for Randomized Affine Maximizer Auctions

Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria

Improved Sample Complexity Bounds for Branch-and-Cut

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts

Team Correlated Equilibria in Zero-Sum Extensive-Form Games via Tree Decompositions

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Efficient exploration of zero-sum stochastic games

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

Limited Lookahead in Imperfect-Information Games

Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

Sparsified Linear Programming for Zero-Sum Equilibrium Finding

Stochastic Regret Minimization in Extensive-Form Games

Hardness of the Pricing Problem for Chains in Barter Exchanges

Imperfect-Recall Abstractions with Bounds in Games

Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning

Sample Complexity of Automated Mechanism Design

Small Representations of Big Kidney Exchange Graphs

Ignorance is Almost Bliss: Near-Optimal Stochastic Matching With Few Queries

Strong Nash equilibria and mixed strategies

Algorithms for Closed Under Rational Behavior (CURB) Sets

Complexity of Mechanism Design

On the complexity of strong Nash equilibrium: Hard-to-solve instances and smoothed complexity

Combining local search techniques and path following for bimatrix games

Common Voting Rules as Maximum Likelihood Estimators