Source author record

Amy Greenwald

Amy Greenwald appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Artificial Intelligence econ.TH Machine Learning

Catalog footprint

What is connected

13works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Consumer-Theoretic Characterization of Fisher Market Equilibria

In this paper, we bring consumer theory to bear in the analysis of Fisher markets whose buyers have arbitrary continuous, concave, homogeneous (CCH) utility functions representing locally non-satiated preferences. The main tools we use are the dual concepts of expenditure minimization and indirect utility maximization. First, we use expenditure functions to construct a new convex program whose dual, like the dual of the Eisenberg-Gale program, characterizes the equilibrium prices of CCH Fisher markets. We then prove that the subdifferential of the dual of our convex program is equal to the negative excess demand in the associated market, which makes generalized gradient descent equivalent to computing equilibrium prices via tâtonnement. Finally, we run a series of experiments which suggest that tâtonnement may converge at a rate of $O\left(\frac{(1+E)}{t^2}\right)$ in CCH Fisher markets that comprise buyers with elasticity of demand bounded by $E$. Our novel characterization of equilibrium prices may provide a path to proving the convergence of tâtonnement in Fisher markets beyond those in which buyers utilities exhibit constant elasticity of substitution.

preprint2022arXiv

Computational and Data Requirements for Learning Generic Properties of Simulation-Based Games

Empirical game-theoretic analysis (EGTA) is primarily focused on learning the equilibria of simulation-based games. Recent approaches have tackled this problem by learning a uniform approximation of the game's utilities, and then applying precision-recall theorems: i.e., all equilibria of the true game are approximate equilibria in the estimated game, and vice-versa. In this work, we generalize this approach to all game properties that are well behaved (i.e., Lipschitz continuous in utilities), including regret (which defines Nash and correlated equilibria), adversarial values, and power-mean and Gini social welfare. Further, we introduce a novel algorithm -- progressive sampling with pruning (PSP) -- for learning a uniform approximation and thus any well-behaved property of a game, which prunes strategy profiles once the corresponding players' utilities are well-estimated, and we analyze its data and query complexities in terms of the a priori unknown utility variances. We experiment with our algorithm extensively, showing that 1) the number of queries that PSP saves is highly sensitive to the utility variance distribution, and 2) PSP consistently outperforms theoretical upper bounds, achieving significantly lower query complexities than natural baselines. We conclude with experiments that uncover some of the remaining difficulties with learning properties of simulation-based games, in spite of recent advances in statistical EGTA methodology, including those developed herein.

preprint2022arXiv

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.

preprint2022arXiv

Gradient Descent Ascent in Min-Max Stackelberg Games

Min-max optimization problems (i.e., min-max games) have attracted a great deal of attention recently as their applicability to a wide range of machine learning problems has become evident. In this paper, we study min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., Stackelberg, games, for which the relevant solution concept is Stackelberg equilibrium, a generalization of Nash. One of the most popular algorithms for solving min-max games is gradient descent ascent (GDA). We present a straightforward generalization of GDA to min-max Stackelberg games with dependent strategy sets, but show that it may not converge to a Stackelberg equilibrium. We then introduce two variants of GDA, which assume access to a solution oracle for the optimal Karush Kuhn Tucker (KKT) multipliers of the games' constraints. We show that such an oracle exists for a large class of convex-concave min-max Stackelberg games, and provide proof that our GDA variants with such an oracle converge in $O(\frac{1}{\varepsilon^2})$ iterations to an $\varepsilon$-Stackelberg equilibrium, improving on the most efficient algorithms currently known which converge in $O(\frac{1}{\varepsilon^3})$ iterations. We then show that solving Fisher markets, a canonical example of a min-max Stackelberg game, using our novel algorithm, corresponds to buyers and sellers using myopic best-response dynamics in a repeated market, allowing us to prove the convergence of these dynamics in $O(\frac{1}{\varepsilon^2})$ iterations in Fisher markets. We close by describing experiments on Fisher markets which suggest potential ways to extend our theoretical results, by demonstrating how different properties of the objective function can affect the convergence and convergence rate of our algorithms.

preprint2022arXiv

Hindsight and Sequential Rationality of Correlated Play

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.

preprint2022arXiv

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that of SPG, however, we show that the Monte Carlo updates differ in a substantial way: the importance correction accounting for a sampled action is nullified in the SPG update, but not in the NeuRD update. Naturally, this causes the NeuRD update to have higher variance than its SPG counterpart. Building on implicit exploration algorithms in the adversarial bandit setting, we introduce capped implicit exploration (CIX) estimates that allow us to construct NeuRD-CIX, which interpolates between this aspect of NeuRD and SPG. We show how CIX estimates can be used in a black-box reduction to construct bandit algorithms with regret bounds that hold with high probability and the benefits this entails for NeuRD-CIX in sequential decision-making settings. Our analysis reveals a bias--variance tradeoff between SPG and NeuRD, and shows how theory predicts that NeuRD-CIX will perform well more consistently than NeuRD while retaining NeuRD's advantages over SPG in non-stationary environments.

preprint2022arXiv

Robust No-Regret Learning in Min-Max Stackelberg Games

The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., min-max Stackelberg, games. We consider two settings, one in which only the first player chooses their actions using a no-regret algorithm while the second player best responds, and one in which both players use no-regret algorithms. For the former case, we show that no-regret dynamics converge to a Stackelberg equilibrium. For the latter case, we introduce a new type of regret, which we call Lagrangian regret, and show that if both players minimize their Lagrangian regrets, then play converges to a Stackelberg equilibrium. We then observe that online mirror descent (OMD) dynamics in these two settings correspond respectively to a known nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new simultaneous GDA-like algorithm, thereby establishing convergence of these algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of OMD dynamics to perturbations by investigating online min-max Stackelberg games. We prove that OMD dynamics are robust for a large class of online min-max games with independent strategy sets. In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy sets.

preprint2021arXiv

Learning Competitive Equilibria in Noisy Combinatorial Markets

We present a methodology to robustly estimate the competitive equilibria (CE) of combinatorial markets under the assumption that buyers do not know their precise valuations for bundles of goods, but instead can only provide noisy estimates. We first show tight lower- and upper-bounds on the buyers' utility loss, and hence the set of CE, given a uniform approximation of one market by another. We then develop a learning framework for our setup, and present two probably-approximately-correct algorithms for learning CE, i.e., producing uniform approximations that preserve CE, with finite-sample guarantees. The first is a baseline that uses Hoeffding's inequality to produce a uniform approximation of buyers' valuations with high probability. The second leverages a connection between the first welfare theorem of economics and uniform approximations to adaptively prune value queries when it determines that they are provably not part of a CE. We experiment with our algorithms and find that the pruning algorithm achieves better estimates than the baseline with far fewer samples.

preprint2016arXiv

Optimal Auctions with Convex Perceived Payments

Myerson derived a simple and elegant solution to the single-parameter revenue-maximization problem in his seminal work on optimal auction design assuming the usual model of quasi-linear utilities. In this paper, we consider a slight generalization of this usual model---from linear to convex "perceived" payments. This more general problem does not appear to admit a solution as simple and elegant as Myerson's. While some of Myerson's results extend to our setting, like his payment formula (suitably adjusted), others do not. For example, we observe that the solutions to the Bayesian and the robust (i.e., non-Bayesian) optimal auction design problems in the convex perceived payment setting do not coincide like they do in the case of linear payments. We therefore study the two problems in turn. We derive an upper and a heuristic lower bound on expected revenue in our setting. These bounds are easily computed pointwise, and yield monotonic allocation rules, so can be supported by Myerson payments (suitably adjusted). In this way, our bounds yield heuristics that approximate the optimal robust auction, assuming convex perceived payments. We close with experiments, the final set of which massages the output of one of the closed-form heuristics for the robust problem into an extremely fast, near-optimal heuristic solution to the Bayesian optimal auction design problem.

preprint2014arXiv

RoxyBot-06: Stochastic Prediction and Optimization in TAC Travel

In this paper, we describe our autonomous bidding agent, RoxyBot, who emerged victorious in the travel division of the 2006 Trading Agent Competition in a photo finish. At a high level, the design of many successful trading agents can be summarized as follows: (i) price prediction: build a model of market prices; and (ii) optimization: solve for an approximately optimal set of bids, given this model. To predict, RoxyBot builds a stochastic model of market prices by simulating simultaneous ascending auctions. To optimize, RoxyBot relies on the sample average approximation method, a stochastic optimization technique.

preprint2012arXiv

An Algorithm for Computing Stochastically Stable Distributions with Applications to Multiagent Learning in Repeated Games

One of the proposed solutions to the equilibrium selection problem for agents learning in repeated games is obtained via the notion of stochastic stability. Learning algorithms are perturbed so that the Markov chain underlying the learning dynamics is necessarily irreducible and yields a unique stable distribution. The stochastically stable distribution is the limit of these stable distributions as the perturbation rate tends to zero. We present the first exact algorithm for computing the stochastically stable distribution of a Markov chain. We use our algorithm to predict the long-term dynamics of simple learning algorithms in sample repeated games.

preprint2012arXiv

Bidding under Uncertainty: Theory and Experiments

This paper describes a study of agent bidding strategies, assuming combinatorial valuations for complementary and substitutable goods, in three auction environments: sequential auctions, simultaneous auctions, and the Trading Agent Competition (TAC) Classic hotel auction design, a hybrid of sequential and simultaneous auctions. The problem of bidding in sequential auctions is formulated as an MDP, and it is argued that expected marginal utility bidding is the optimal bidding policy. The problem of bidding in simultaneous auctions is formulated as a stochastic program, and it is shown by example that marginal utility bidding is not an optimal bidding policy, even in deterministic settings. Two alternative methods of approximating a solution to this stochastic program are presented: the first method, which relies on expected values, is optimal in deterministic environments; the second method, which samples the nondeterministic environment, is asymptotically optimal as the number of samples tends to infinity. Finally, experiments with these various bidding policies are described in the TAC Classic setting.

preprint2012arXiv

Self-Confirming Price Prediction Strategies for Simultaneous One-Shot Auctions

Bidding in simultaneous auctions is challenging because an agent's value for a good in one auction may depend on the uncertain outcome of other auctions: the so-called exposure problem. Given the gap in understanding of general simultaneous auction games, previous works have tackled this problem with heuristic strategies that employ probabilistic price predictions. We define a concept of self-confirming prices, and show that within an independent private value model, Bayes-Nash equilibrium can be fully characterized as a profile of optimal price prediction strategies with self-confirming predictions. We exhibit practical procedures to compute approximately optimal bids given a probabilistic price prediction, and near self-confirming price predictions given a price-prediction strategy. An extensive empirical game-theoretic analysis demonstrates that self-confirming price prediction strategies are effective in simultaneous auction games with both complementary and substitutable preference structures.

Amy Greenwald

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

A Consumer-Theoretic Characterization of Fisher Market Equilibria

Computational and Data Requirements for Learning Generic Properties of Simulation-Based Games

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Gradient Descent Ascent in Min-Max Stackelberg Games

Hindsight and Sequential Rationality of Correlated Play

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Robust No-Regret Learning in Min-Max Stackelberg Games

Learning Competitive Equilibria in Noisy Combinatorial Markets

Optimal Auctions with Convex Perceived Payments

RoxyBot-06: Stochastic Prediction and Optimization in TAC Travel

An Algorithm for Computing Stochastically Stable Distributions with Applications to Multiagent Learning in Repeated Games

Bidding under Uncertainty: Theory and Experiments

Self-Confirming Price Prediction Strategies for Simultaneous One-Shot Auctions