Source author record

Stuart Armstrong

Stuart Armstrong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence math.DG hep-th Human-Computer Interaction math-ph math.GM math.MP

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI

As artificial intelligence becomes more powerful and a ubiquitous presence in daily life, it is imperative to understand and manage the impact of AI systems on our lives and decisions. Modern ML systems often change user behavior (e.g. personalized recommender systems learn user preferences to deliver recommendations that change online behavior). An externality of behavior change is preference change. This article argues for the establishment of a multidisciplinary endeavor focused on understanding how AI systems change preference: Preference Science. We operationalize preference to incorporate concepts from various disciplines, outlining the importance of meta-preferences and preference-change preferences, and proposing a preliminary framework for how preferences change. We draw a distinction between preference change, permissible preference change, and outright preference manipulation. A diversity of disciplines contribute unique insights to this framework.

preprint2022arXiv

The dangers in algorithms learning humans' values and irrationalities

For an artificial intelligence (AI) to be aligned with human values (or human preferences), it must first learn those values. AI systems that are trained on human behavior, risk miscategorising human irrationalities as human values -- and then optimising for these irrationalities. Simply learning human values still carries risks: AI learning them will inevitably also gain information on human irrationalities and human behaviour/policy. Both of these can be dangerous: knowing human policy allows an AI to become generically more powerful (whether it is partially aligned or not aligned at all), while learning human irrationalities allows it to exploit humans without needing to provide value in return. This paper analyses the danger in developing artificial intelligence that learns about human irrationalities and human policy, and constructs a model recommendation system with various levels of information about human biases, human policy, and human values. It concludes that, whatever the power and knowledge of the AI, it is more dangerous for it to know human irrationalities than human values. Thus it is better for the AI to learn human values directly, rather than learning human biases and then deducing values from behaviour.

preprint2020arXiv

Pitfalls of learning a reward function online

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction, refusing to learn, ``learning'' facts already known to the agent, and making decisions that are strictly dominated (for all relevant reward functions). We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. The second is `uninfluenceability', whereby the reward-function learning process operates by learning facts about the environment. We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently rich, the converse is true too.

preprint2012arXiv

Courant Algebroids in Parabolic Geometry

Let $p$ be a Lie subalgebra of a semisimple Lie algebra $g$ and $(G,P)$ be the corresponding pair of connected Lie groups. A Cartan geometry of type $(G,P)$ associates to a smooth manifold $M$ a principal $P$-bundle and a Cartan connection, and a parabolic geometry is a Cartan geometry where $P$ is parabolic. We show that if $P$ is parabolic, the adjoint tractor bundle of a Cartan geometry, which is isomorphic to the Atiyah algebroid of the principal $P$-bundle, admits the structure of a (pre-)Courant algebroid, and we identify the topological obstruction to the bracket being a Courant bracket. For semisimple $G$, the Atiyah algebroid of the principal $P$-bundle associated to the Cartan geometry of $(G,P)$ admits a pre-Courant algebroid structure if and only if $P$ is parabolic.

preprint2011arXiv

Note on pre-Courant algebroid structures for parabolic geometries

This note aims to demonstrate that every parabolic geometry has a naturally defined per-Courant algebroïd structure. This structure is a Courant algebroïd if and only if the the curvature $κ$ of the Cartan connection vanishes. In all other cases, if the parabolic geometry is regular, there does not exist a natural universal expression for a Courant bracket.

preprint2006arXiv

Ambient connections realising conformal Tractor holonomy

For a conformal manifold we introduce the notion of an ambient connection, an affine connection on an ambient manifold of the conformal manifold, possibly with torsion, and with conditions relating it to the conformal structure. The purpose of this construction is to realise the normal conformal tractor holonomy as affine holonomy of such a connection. We give an example of an ambient connection for which this is the case, and which is torsion free if we start the construction with a C-space, and in addition Ricci-flat if we start with an Einstein manifold. Thus for a $C$-space this example leads to an ambient metric in the weaker sense of Čap and Gover, and for an Einstein space to a Ricci-flat ambient metric in the sense of Fefferman and Graham.