Source author record

Andrew Critch

Andrew Critch appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computer Science and Game Theory Multiagent Systems Neural and Evolutionary Computing cs.CY Logic in Computer Science Cryptography and Security math-ph math.MP Methodology quant-ph

Catalog footprint

What is connected

12works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory

It is increasingly possible for real-world agents, such as software-based agents or human institutions, to view the internal programming of other such agents that they interact with. For instance, a company can read the bylaws of another company, or one software system can read the source code of another. Game-theoretic equilibria between the designers of such agents are called \emph{program equilibria}, and we call this area \emph{open-source game theory}. In this work we demonstrate a series of counterintuitive results on open-source games, which are independent of the programming language in which agents are written. We show that certain formal institution designs that one might expect to defect against each other will instead turn out to cooperate, or conversely, cooperate when one might expect them to defect. The results hold in a setting where each institution has full visibility into the other institution's true operating procedures. We also exhibit examples and ten open problems for better understanding these phenomena. We argue that contemporary game theory remains ill-equipped to study program equilibria, given that even the outcomes of single games in open-source settings remain counterintuitive and poorly understood. Nonetheless, some of these open-source agents exhibit desirable characteristics -- e.g., they can unexploitably create incentives for cooperation and legibility from other agents -- such that analyzing them could yield considerable benefits.

preprint2022arXiv

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint, asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs.

preprint2022arXiv

Pruned Neural Networks are Surprisingly Modular

The learned weights of a neural network are often considered devoid of scrutable internal structure. To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images. Our notion of modularity comes from the graph clustering literature: a "module" is a set of neurons with strong internal connectivity but weak external connectivity. We find that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modular than random MLPs with the same (sparse) distribution of weights. Interestingly, they are much more modular when trained with dropout. We also present exploratory analyses of the importance of different modules for performance and how modules depend on each other. Understanding the modular structure of neural networks, when such structure exists, will hopefully render their inner workings more interpretable to engineers. Note that this paper has been superceded by "Clusterability in Neural Networks", arxiv:2103.03386 and "Quantifying Local Specialization in Deep Neural Networks", arxiv:2110.08058!

preprint2022arXiv

Quantifying Local Specialization in Deep Neural Networks

A neural network is locally specialized to the extent that parts of its computational graph (i.e. structure) can be abstractly represented as performing some comprehensible sub-task relevant to the overall task (i.e. functionality). Are modern deep neural networks locally specialized? How can this be quantified? In this paper, we consider the problem of taking a neural network whose neurons are partitioned into clusters, and quantifying how functionally specialized the clusters are. We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs. To measure these proxies, we develop a set of statistical methods based on techniques conventionally used to interpret individual neurons. We apply the proxies to partitionings generated by spectrally clustering a graph representation of the network's neurons with edges determined either by network weights or correlations of activations. We show that these partitionings, even ones based only on weights (i.e. strictly from non-runtime analysis), reveal groups of neurons that are important and coherent. These results suggest that graph-based partitioning can reveal local specialization and that statistical methods can be used to automatedly screen for sets of neurons that can be understood abstractly.

preprint2022arXiv

WordSig: QR streams enabling platform-independent self-identification that's impossible to deepfake

Deepfakes can degrade the fabric of society by limiting our ability to trust video content from leaders, authorities, and even friends. Cryptographically secure digital signatures may be used by video streaming platforms to endorse content, but these signatures are applied by the content distributor rather than the participants in the video. We introduce WordSig, a simple protocol allowing video participants to digitally sign the words they speak using a stream of QR codes, and allowing viewers to verify the consistency of signatures across videos. This allows establishing a trusted connection between the viewer and the participant that is not mediated by the content distributor. Given the widespread adoption of QR codes for distributing hyperlinks and vaccination records, and the increasing prevalence of celebrity deepfakes, 2022 or later may be a good time for public figures to begin using and promoting QR-based self-authentication tools.

preprint2021arXiv

Clusterability in Neural Networks

The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.

preprint2021arXiv

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary. The adversary is motivated to generate environments which maximize regret, defined as the difference between the protagonist and antagonist agent's return. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.

preprint2020arXiv

AI Research Considerations for Human Existential Safety (ARCHES)

Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species. In negative terms, we ask what existential risks humanity might face from AI development in the next century, and by what principles contemporary technical research might be directed to address those risks. A key property of hypothetical AI technologies is introduced, called \emph{prepotence}, which is useful for delineating a variety of potential existential risks from artificial intelligence, even as AI paradigms might shift. A set of \auxref{dirtot} contemporary research \directions are then examined for their potential benefit to existential safety. Each research direction is explained with a scenario-driven motivation, and examples of existing work from which to build. The research directions present their own risks and benefits to society that could occur at various scales of impact, and in particular are not guaranteed to benefit existential safety if major developments in them are deployed without adequate forethought and oversight. As such, each direction is accompanied by a consideration of potentially negative side effects.

preprint2020arXiv

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism. In an MPAG, a single agent assists N human principals who may have widely different preferences. MPAGs generalize assistance games, also known as cooperative inverse reinforcement learning games. We analyze in particular a generalization of apprenticeship learning in which the humans first perform some work to obtain utility and demonstrate their preferences, and then the robot acts to further maximize the sum of human payoffs. We show in this setting that if the game is sufficiently collegial, i.e. if the humans are responsible for obtaining a sufficient fraction of the rewards through their own actions, then their preferences are straightforwardly revealed through their work. This revelation mechanism is non-dictatorial, does not limit the possible outcomes to two alternatives, and is dominant-strategy incentive-compatible.

preprint2016arXiv

Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents

Löb's theorem and Gödel's theorems make predictions about the behavior of systems capable of self-reference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of self-reference will have limited memory and processing speed, so in this paper we introduce an effective version of Löb's theorem which is applicable given such bounded resources. These results have powerful implications for the game theory of bounded agents who are able to write proofs about themselves and one another, including the capacity to out-perform classical Nash equilibria and correlated equilibria, attaining mutually cooperative program equilibrium in the Prisoner's Dilemma. Previous cooperative program equilibria studied by Tennenholtz (2004) and Fortnow (2009) have depended on tests for program equality, a fragile condition, whereas "Löbian" cooperation is much more robust and agnostic of the opponent's implementation.

preprint2014arXiv

Algebraic Geometry of Matrix Product States

We quantify the representational power of matrix product states (MPS) for entangled qubit systems by giving polynomial expressions in a pure quantum state's amplitudes which hold if and only if the state is a translation invariant matrix product state or a limit of such states. For systems with few qubits, we give these equations explicitly, considering both periodic and open boundary conditions. Using the classical theory of trace varieties and trace algebras, we explain the relationship between MPS and hidden Markov models and exploit this relationship to derive useful parameterizations of MPS. We make four conjectures on the identifiability of MPS parameters.

preprint2012arXiv

A note on the proportionality between some consistency indices in the AHP

Analyzing the consistency of preferences is an important step in decision making with pairwise comparison matrices, and several indices have been proposed in order to estimate it. In this paper we prove the proportionality between some consistency indices in the framework of the Analytic Hierarchy Process. Knowing such equivalences eliminates redundancy in the consideration of evidence for consistent preferences.

Andrew Critch

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Pruned Neural Networks are Surprisingly Modular

Quantifying Local Specialization in Deep Neural Networks

WordSig: QR streams enabling platform-independent self-identification that's impossible to deepfake

Clusterability in Neural Networks

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

AI Research Considerations for Human Existential Safety (ARCHES)

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents

Algebraic Geometry of Matrix Product States

A note on the proportionality between some consistency indices in the AHP