Researcher profile

Peter Vamplew

Peter Vamplew contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2024arXiv

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

One common approach to solve multi-objective reinforcement learning (MORL) problems is to extend conventional Q-learning by using vector Q-values in combination with a utility function. However issues can arise with this approach in the context of stochastic environments, particularly when optimising for the Scalarised Expected Reward (SER) criterion. This paper extends prior research, providing a detailed examination of the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy for an environment with stochastic state transitions. We empirically examine several variations of the core multi-objective Q-learning algorithm as well as reward engineering approaches, and demonstrate the limitations of these methods. In particular, we highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.

preprint2022arXiv

Evaluating Human-like Explanations for Robot Actions in Reinforcement Learning Scenarios

Explainable artificial intelligence is a research field that tries to provide more transparency for autonomous intelligent systems. Explainability has been used, particularly in reinforcement learning and robotic scenarios, to better understand the robot decision-making process. Previous work, however, has been widely focused on providing technical explanations that can be better understood by AI practitioners than non-expert end-users. In this work, we make use of human-like explanations built from the probability of success to complete the goal that an autonomous robot shows after performing an action. These explanations are intended to be understood by people who have no or very little experience with artificial intelligence methods. This paper presents a user trial to study whether these explanations that focus on the probability an action has of succeeding in its goal constitute a suitable explanation for non-expert end-users. The results obtained show that non-expert participants rate robot explanations that focus on the probability of success higher and with less variance than technical explanations generated from Q-values, and also favor counterfactual explanations over standalone explanations.

preprint2022arXiv

Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning

Interactive reinforcement learning proposes the use of externally-sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human-trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent's ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice.

preprint2021arXiv

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

preprint2021arXiv

Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, it is still undesirable to use this approach for the development of artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.

preprint2020arXiv

A Multi-Objective Deep Reinforcement Learning Framework

This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment and three-objective Mountain Car problem) indicate that the proposed framework is able to find the Pareto-optimal solutions effectively. The proposed framework is generic and highly modularized, which allows the integration of different deep reinforcement learning algorithms in different complex problem domains. This therefore overcomes many disadvantages involved with standard multi-objective reinforcement learning methods in the current literature. The proposed framework acts as a testbed platform that accelerates the development of MODRL for solving increasingly complicated multi-objective problems.

preprint2020arXiv

Discrete-to-Deep Supervised Policy Learning

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.