Researcher profile

Noam Brown

Noam Brown contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Equilibrium Finding in Normal-Form Games Via Greedy Regret Minimization

We extend the classic regret minimization framework for approximating equilibria in normal-form games by greedily weighing iterates based on regrets observed at runtime. Theoretically, our method retains all previous convergence rate guarantees. Empirically, experiments on large randomly generated games and normal-form subgames of the AI benchmark Diplomacy show that greedy weights outperforms previous methods whenever sampling is used, sometimes by several orders of magnitude.

preprint2022arXiv

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search based on the KL divergence from an imitation-learned policy results in higher human prediction accuracy and stronger performance than imitation learning alone. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that using this algorithm for search in no-press Diplomacy yields a policy that matches the human prediction accuracy of imitation learning while being substantially stronger.

preprint2021arXiv

Safe Search for Stackelberg Equilibria in Extensive-Form Games

Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower. In recent years, it has become a cornerstone of many security applications, including airport patrolling and wildlife poaching prevention. Even though many of these settings are sequential in nature, existing techniques pre-compute the entire solution ahead of time. In this paper, we present a theoretically sound and empirically effective way to apply search, which leverages extra online computation to improve a solution, to the computation of Stackelberg equilibria in general-sum games. Instead of the leader attempting to solve the full game upfront, an approximate "blueprint" solution is first computed offline and is then improved online for the particular subgames encountered in actual play. We prove that our search technique is guaranteed to perform no worse than the pre-computed blueprint strategy, and empirically demonstrate that it enables approximately solving significantly larger games compared to purely offline methods. We also show that our search operation may be cast as a smaller Stackelberg problem, making our method complementary to existing algorithms based on strategy generation.

preprint2020arXiv

Unlocking the Potential of Deep Counterfactual Value Networks

Deep counterfactual value networks combined with continual resolving provide a way to conduct depth-limited search in imperfect-information games. However, since their introduction in the DeepStack poker AI, deep counterfactual value networks have not seen widespread adoption. In this paper we introduce several improvements to deep counterfactual value networks, as well as counterfactual regret minimization, and analyze the effects of each change. We combined these improvements to create the poker AI Supremus. We show that while a reimplementation of DeepStack loses head-to-head against the strong benchmark agent Slumbot, Supremus successfully beats Slumbot by an extremely large margin and also achieves a lower exploitability than DeepStack against a local best response. Together, these results show that with our key improvements, deep counterfactual value networks can achieve state-of-the-art performance.

preprint2013arXiv

Controlling the Electronic Properties of Nanodiamonds Via Surface Chemical Functionalization: A DFT Study

The electronic properties of chemically functionalized nanodiamonds are studied using density functional theory calculations. HOMO-LUMO gap and relative stabilities are calculated for different surface functionalization schemes and diamond nanocrystal morphologies. The effects of chemical decoration on the size and nature of the HOMO-LUMO gap of the various systems considered are discussed in detail. We conclude that surface chemical functionalization has the potential to become an accessible route for controlling the electronic properties of nanodiamonds.