Source author record

Yoram Bachrach

Yoram Bachrach appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Artificial Intelligence Multiagent Systems Machine Learning Computational Complexity Computer Vision Data Structures and Algorithms econ.TH Human-Computer Interaction

Catalog footprint

What is connected

20works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Toward recursive self-improvement, we investigate LLM agents autonomously designing foundation models beyond standard Transformers. We introduce a dual-framework approach: AIRA-Compose for high-level architecture search, and AIRA-Design for low-level mechanistic implementation. AIRA-Compose uses 11 agents to explore fundamental computational primitives under a 24-hour budget. Agents evaluate million-parameter candidates, extrapolating top designs to 350M, 1B, and 3B scales. This yields 14 architectures across two families: AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba). Pre-trained at 1B scale, these consistently outperform Llama 3.2 and Composer-found baselines. On downstream tasks, AIRAformer-D and AIRAhybrid-D improve accuracy by 2.4% and 3.8% over Llama 3.2. Furthermore, AIRA-Compose finds models with highly efficient scaling frontiers: AIRAformer-C scales 54% and 71% faster than Llama 3.2 and Composer's best Transformer, while AIRAhybrid-C outscales Nemotron-2 by 23% and Composer's best hybrid by 37%. AIRA-Design tasks 20 agents with writing novel attention mechanisms for long-range dependencies and high-performing training scripts. On the Long Range Arena benchmark, agent-designed architectures reach within 2.3% and 2.6% of human state-of-the-art on document matching and text classification. On the Autoresearch benchmark, Greedy Opus 4.5 achieves 0.968 validation bits-per-byte under a fixed time budget, surpassing the published minimum. Together, these frameworks show AI agents can autonomously discover architectures and algorithmic optimizations matching or surpassing hand-designed baselines. This establishes a powerful paradigm for discovering next-generation foundation models, marking a clear step toward recursive self-improvement.

preprint2022arXiv

D3C: Reducing the Price of Anarchy in Multi-Agent Learning

In multiagent systems, the complex interaction of fixed incentives can lead agents to outcomes that are poor (inefficient) not only for the group, but also for each individual. Price of anarchy is a technical, game-theoretic definition that quantifies the inefficiency arising in these scenarios -- it compares the welfare that can be achieved through perfect coordination against that achieved by self-interested agents at a Nash equilibrium. We derive a differentiable, upper bound on a price of anarchy that agents can cheaply estimate during learning. Equipped with this estimator, agents can adjust their incentives in a way that improves the efficiency incurred at a Nash equilibrium. Agents do so by learning to mix their reward (equiv. negative loss) with that of other agents by following the gradient of our derived upper bound. We refer to this approach as D3C. In the case where agent incentives are differentiable, D3C resembles the celebrated Win-Stay, Lose-Shift strategy from behavioral game theory, thereby establishing a connection between the global goal of maximum welfare and an established agent-centric learning rule. In the non-differentiable setting, as is common in multiagent reinforcement learning, we show the upper bound can be reduced via evolutionary strategies, until a compromise is reached in a distributed fashion. We demonstrate that D3C improves outcomes for each agent and the group as a whole on several social dilemmas including a traffic network exhibiting Braess's paradox, a prisoner's dilemma, and several multiagent domains.

preprint2022arXiv

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

preprint2022arXiv

Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members

In many multi-agent settings, participants can form teams to achieve collective outcomes that may far surpass their individual capabilities. Measuring the relative contributions of agents and allocating them shares of the reward that promote long-lasting cooperation are difficult tasks. Cooperative game theory offers solution concepts identifying distribution schemes, such as the Shapley value, that fairly reflect the contribution of individuals to the performance of the team or the Core, which reduces the incentive of agents to abandon their team. Applications of such methods include identifying influential features and sharing the costs of joint ventures or team formation. Unfortunately, using these solutions requires tackling a computational barrier as they are hard to compute, even in restricted settings. In this work, we show how cooperative game-theoretic solutions can be distilled into a learned model by training neural networks to propose fair and stable payoff allocations. We show that our approach creates models that can generalize to games far from the training distribution and can predict solutions for more players than observed during training. An important application of our framework is Explainable AI: our approach can be used to speed-up Shapley value computations on many instances.

preprint2022arXiv

Role of Human-AI Interaction in Selective Prediction

Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI team as opposed to by themselves. We show that this is not the case by performing experiments to quantify human-AI interaction in the context of selective prediction. In particular, we study the impact of communicating different types of information to humans about the AI system's decision to defer. Using real-world conservation data and a selective prediction system that improves expected accuracy over that of the human or AI system working individually, we show that this messaging has a significant impact on the accuracy of human judgements. Our results study two components of the messaging strategy: 1) Whether humans are informed about the prediction of the AI system and 2) Whether they are informed about the decision of the selective prediction system to defer. By manipulating these messaging components, we show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI. We therefore show that it is vital to consider how the decision to defer is communicated to a human when designing selective prediction systems, and that the composite accuracy of a human-AI team must be carefully evaluated using a human-in-the-loop framework.

preprint2022arXiv

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by McKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms.

preprint2022arXiv

Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering

Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than $1$) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.

preprint2020arXiv

Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum games, the challenge is usually viewed as finding Nash equilibrium strategies, safeguarding against exploitation regardless of the opponent. While this captures the intricacies of chess or Go, it avoids the notion of cooperation with co-players, a hallmark of the major transitions leading from unicellular organisms to human civilization. Beyond two players, alliance formation often confers an advantage; however this requires trust, namely the promise of mutual cooperation in the face of incentives to defect. Successful play therefore requires adaptation to co-players rather than the pursuit of non-exploitability. Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research. Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma, and empirically that naïve multi-agent reinforcement learning therefore fails to form alliances. We introduce a toy model of economic competition, and show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances. Finally, we generalize our agent model to incorporate temporally-extended contracts, presenting opportunities for further work.

preprint2016arXiv

Predicting Personal Traits from Facial Images using Convolutional Neural Networks Augmented with Facial Landmark Information

We consider the task of predicting various traits of a person given an image of their face. We estimate both objective traits, such as gender, ethnicity and hair-color; as well as subjective traits, such as the emotion a person expresses or whether he is humorous or attractive. For sizeable experimentation, we contribute a new Face Attributes Dataset (FAD), having roughly 200,000 attribute labels for the above traits, for over 10,000 facial images. Due to the recent surge of research on Deep Convolutional Neural Networks (CNNs), we begin by using a CNN architecture for estimating facial attributes and show that they indeed provide an impressive baseline performance. To further improve performance, we propose a novel approach that incorporates facial landmark information for input images as an additional channel, helping the CNN learn better attribute-specific features so that the landmarks across various training images hold correspondence. We empirically analyse the performance of our method, showing consistent improvement over the baseline across traits.

preprint2014arXiv

False-Name Manipulations in Weighted Voting Games

Weighted voting is a classic model of cooperation among agents in decision-making domains. In such games, each player has a weight, and a coalition of players wins the game if its total weight meets or exceeds a given quota. A players power in such games is usually not directly proportional to his weight, and is measured by a power index, the most prominent among which are the Shapley-Shubik index and the Banzhaf index.In this paper, we investigate by how much a player can change his power, as measured by the Shapley-Shubik index or the Banzhaf index, by means of a false-name manipulation, i.e., splitting his weight among two or more identities. For both indices, we provide upper and lower bounds on the effect of weight-splitting. We then show that checking whether a beneficial split exists is NP-hard, and discuss efficient algorithms for restricted cases of this problem, as well as randomized algorithms for the general case. We also provide an experimental evaluation of these algorithms. Finally, we examine related forms of manipulative behavior, such as annexation, where a player subsumes other players, or merging, where several players unite into one. We characterize the computational complexity of such manipulations and provide limits on their effects. For the Banzhaf index, we describe a new paradox, which we term the Annexation Non-monotonicity Paradox.

preprint2014arXiv

Optimising Trade-offs Among Stakeholders in Ad Auctions

We examine trade-offs among stakeholders in ad auctions. Our metrics are the revenue for the utility of the auctioneer, the number of clicks for the utility of the users and the welfare for the utility of the advertisers. We show how to optimize linear combinations of the stakeholder utilities, showing that these can be tackled through a GSP auction with a per-click reserve price. We then examine constrained optimization of stakeholder utilities. We use simulations and analysis of real-world sponsored search auction data to demonstrate the feasible trade-offs, examining the effect of changing the allowed number of ads on the utilities of the stakeholders. We investigate both short term effects, when the players do not have the time to modify their behavior, and long term equilibrium conditions. Finally, we examine a combinatorially richer constrained optimization problem, where there are several possible allowed configurations (templates) of ad formats. This model captures richer ad formats, which allow using the available screen real estate in various ways. We show that two natural generalizations of the GSP auction rules to this domain are poorly behaved, resulting in not having a symmetric Nash equilibrium or having one with poor welfare. We also provide positive results for restricted cases.

preprint2014arXiv

Power Distribution in Randomized Weighted Voting: the Effects of the Quota

We study the Shapley value in weighted voting games. The Shapley value has been used as an index for measuring the power of individual agents in decision-making bodies and political organizations, where decisions are made by a majority vote process. We characterize the impact of changing the quota (i.e., the minimum number of seats in the parliament that are required to form a coalition) on the Shapley values of the agents. Contrary to previous studies, which assumed that the agent weights (corresponding to the size of a caucus or a political party) are fixed, we analyze new domains in which the weights are stochastically generated, modelling, for example, elections processes. We examine a natural weight generation process: the Balls and Bins model, with uniform as well as exponentially decaying probabilities. We also analyze weights that admit a super-increasing sequence, answering several open questions pertaining to the Shapley values in such games.

preprint2014arXiv

Sharing Rewards in Cooperative Connectivity Games

We consider how selfish agents are likely to share revenues derived from maintaining connectivity between important network servers. We model a network where a failure of one node may disrupt communication between other nodes as a cooperative game called the vertex Connectivity Game (CG). In this game, each agent owns a vertex, and controls all the edges going to and from that vertex. A coalition of agents wins if it fully connects a certain subset of vertices in the graph, called the primary vertices. Power indices measure an agents ability to affect the outcome of the game. We show that in our domain, such indices can be used to both determine the fair share of the revenues an agent is entitled to, and identify significant possible points of failure affecting the reliability of communication in the network. We show that in general graphs, calculating the Shapley and Banzhaf power indices is #P-complete, but suggest a polynomial algorithm for calculating them in trees. We also investigate finding stable payoff divisions of the revenues in CGs, captured by the game theoretic solution of the core, and its relaxations, the epsilon-core and least core. We show a polynomial algorithm for computing the core of a CG, but show that testing whether an imputation is in the epsilon-core is coNP-complete. Finally, we show that for trees, it is possible to test for epsilon-core imputations in polynomial time.

preprint2013arXiv

Incentives and Efficiency in Uncertain Collaborative Environments

We consider collaborative systems where users make contributions across multiple available projects and are rewarded for their contributions in individual projects according to a local sharing of the value produced. This serves as a model of online social computing systems such as online Q&A forums and of credit sharing in scientific co-authorship settings. We show that the maximum feasible produced value can be well approximated by simple local sharing rules where users are approximately rewarded in proportion to their marginal contributions and that this holds even under incomplete information about the player's abilities and effort constraints. For natural instances we show almost 95% optimality at equilibrium. When players incur a cost for their effort, we identify a threshold phenomenon: the efficiency is a constant fraction of the optimal when the cost is strictly convex and decreases with the number of players if the cost is linear.

preprint2013arXiv

Optimal Coalition Structures in Cooperative Graph Games

Representation languages for coalitional games are a key research area in algorithmic game theory. There is an inherent tradeoff between how general a language is, allowing it to capture more elaborate games, and how hard it is computationally to optimize and solve such games. One prominent such language is the simple yet expressive Weighted Graph Games (WGGs) representation [14], which maintains knowledge about synergies between agents in the form of an edge weighted graph. We consider the problem of finding the optimal coalition structure in WGGs. The agents in such games are vertices in a graph, and the value of a coalition is the sum of the weights of the edges present between coalition members. The optimal coalition structure is a partition of the agents to coalitions, that maximizes the sum of utilities obtained by the coalitions. We show that finding the optimal coalition structure is not only hard for general graphs, but is also intractable for restricted families such as planar graphs which are amenable for many other combinatorial problems. We then provide algorithms with constant factor approximations for planar, minor-free and bounded degree graphs.

preprint2013arXiv

Strong Price of Anarchy and Coalitional Dynamics

We introduce a framework for studying the effect of cooperation on the quality of outcomes in utility games. Our framework is a coalitional analog of the smoothness framework of non-cooperative games. Coalitional smoothness implies bounds on the strong price of anarchy, the loss of quality of coalitionally stable outcomes, as well as bounds on coalitional versions of coarse correlated equilibria and sink equilibria, which we define as out-of-equilibrium myopic behavior as determined by a natural coalitional version of best-response dynamics. Our coalitional smoothness framework captures existing results bounding the strong price of anarchy of network design games. We show that in any monotone utility-maximization game, if each player's utility is at least his marginal contribution to the welfare, then the strong price of anarchy is at most 2. This captures a broad class of games, including games with a very high price of anarchy. Additionally, we show that in potential games the strong price of anarchy is close to the price of stability, the quality of the best Nash equilibrium.

preprint2012arXiv

How To Grade a Test Without Knowing the Answers --- A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing

We propose a new probabilistic graphical model that jointly models the difficulties of questions, the abilities of participants and the correct answers to questions in aptitude testing and crowdsourcing settings. We devise an active learning/adaptive testing scheme based on a greedy minimization of expected model entropy, which allows a more efficient resource allocation by dynamically choosing the next question to be asked based on the previous responses. We present experimental results that confirm the ability of our model to infer the required parameters and demonstrate that the adaptive testing scheme requires fewer questions to obtain the same accuracy as a static test scenario.

preprint2012arXiv

Solving Cooperative Reliability Games

Cooperative games model the allocation of profit from joint actions, following considerations such as stability and fairness. We propose the reliability extension of such games, where agents may fail to participate in the game. In the reliability extension, each agent only "survives" with a certain probability, and a coalition's value is the probability that its surviving members would be a winning coalition in the base game. We study prominent solution concepts in such games, showing how to approximate the Shapley value and how to compute the core in games with few agent types. We also show that applying the reliability extension may stabilize the game, making the core non-empty even when the base game has an empty core.

preprint2010arXiv

Fast Pseudo-Random Fingerprints

We propose a method to exponentially speed up computation of various fingerprints, such as the ones used to compute similarity and rarity in massive data sets. Rather then maintaining the full stream of $b$ items of a universe $[u]$, such methods only maintain a concise fingerprint of the stream, and perform computations using the fingerprints. The computations are done approximately, and the required fingerprint size $k$ depends on the desired accuracy $ε$ and confidence $δ$. Our technique maintains a single bit per hash function, rather than a single integer, thus requiring a fingerprint of length $k = O(\frac{\ln \frac{1}δ}{ε^2})$ bits, rather than $O(\log u \cdot \frac{\ln \frac{1}δ}{ε^2})$ bits required by previous approaches. The main advantage of the fingerprints we propose is that rather than computing the fingerprint of a stream of $b$ items in time of $O(b \cdot k)$, we can compute it in time $O(b \log k)$. Thus this allows an exponential speedup for the fingerprint construction, or alternatively allows achieving a much higher accuracy while preserving computation time. Our methods rely on a specific family of pseudo-random hashes for which we can quickly locate hashes resulting in small values.

preprint2009arXiv

The Cost of Stability in Coalitional Games

A key question in cooperative game theory is that of coalitional stability, usually captured by the notion of the \emph{core}--the set of outcomes such that no subgroup of players has an incentive to deviate. However, some coalitional games have empty cores, and any outcome in such a game is unstable. In this paper, we investigate the possibility of stabilizing a coalitional game by using external payments. We consider a scenario where an external party, which is interested in having the players work together, offers a supplemental payment to the grand coalition (or, more generally, a particular coalition structure). This payment is conditional on players not deviating from their coalition(s). The sum of this payment plus the actual gains of the coalition(s) may then be divided among the agents so as to promote stability. We define the \emph{cost of stability (CoS)} as the minimal external payment that stabilizes the game. We provide general bounds on the cost of stability in several classes of games, and explore its algorithmic properties. To develop a better intuition for the concepts we introduce, we provide a detailed algorithmic study of the cost of stability in weighted voting games, a simple but expressive class of games which can model decision-making in political bodies, and cooperation in multiagent settings. Finally, we extend our model and results to games with coalition structures.

Yoram Bachrach

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

D3C: Reducing the Price of Anarchy in Multi-Agent Learning

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members

Role of Human-AI Interaction in Selective Prediction

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering

Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

Predicting Personal Traits from Facial Images using Convolutional Neural Networks Augmented with Facial Landmark Information

False-Name Manipulations in Weighted Voting Games

Optimising Trade-offs Among Stakeholders in Ad Auctions

Power Distribution in Randomized Weighted Voting: the Effects of the Quota

Sharing Rewards in Cooperative Connectivity Games

Incentives and Efficiency in Uncertain Collaborative Environments

Optimal Coalition Structures in Cooperative Graph Games

Strong Price of Anarchy and Coalitional Dynamics

How To Grade a Test Without Knowing the Answers --- A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing

Solving Cooperative Reliability Games

Fast Pseudo-Random Fingerprints

The Cost of Stability in Coalitional Games