Researcher profile

Daniel Graves

Daniel Graves contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Composing PROBs

A PROB is a "product and braid" category. Such categories can be used to encode the structure borne by an object in a braided monoidal category. In this paper we provide PROBs whose categories of algebras in a braided monoidal category are equivalent to the categories of monoids and comonoids using the category associated to the braid crossed simplicial group of Fiedorowicz and Loday. We show that PROBs can be composed by generalizing the machinery introduced by Lack for PROPs. We use this to define a PROB for bimonoids in a braided monoidal category as a composite of the PROBs for monoids and comonoids.

preprint2022arXiv

Offline Learning of Counterfactual Predictions for Real-World Robotic Reinforcement Learning

We consider real-world reinforcement learning (RL) of robotic manipulation tasks that involve both visuomotor skills and contact-rich skills. We aim to train a policy that maps multimodal sensory observations (vision and force) to a manipulator's joint velocities under practical considerations. We propose to use offline samples to learn a set of general value functions (GVFs) that make counterfactual predictions from the visual inputs. We show that combining the offline learned counterfactual predictions with force feedbacks in online policy learning allows efficient reinforcement learning given only a terminal (success/failure) reward. We argue that the learned counterfactual predictions form a compact and informative representation that enables sample efficiency and provides auxiliary reward signals that guide online explorations towards contact-rich states. Various experiments in simulation and real-world settings were performed for evaluation. Recordings of the real-world robot training can be found via https://sites.google.com/view/realrl.

preprint2021arXiv

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games. A cornerstone of this success is the auto-curriculum framework, which shapes the learning process by continually creating new challenging tasks for agents to adapt to, thereby facilitating the acquisition of new skills. In order to extend MARL methods to real-world domains outside of video games, we envision in this blue sky paper that maintaining a diversity-aware auto-curriculum is critical for successful MARL applications. Specifically, we argue that \emph{behavioural diversity} is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum. We list four open challenges for auto-curriculum techniques, which we believe deserve more attention from this community. Towards validating our vision, we recommend modelling realistic interactive behaviours in autonomous driving as an important test bed, and recommend the SMARTS/ULTRA benchmark.

preprint2020arXiv

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps. To learn the value function for horizon $h$, these algorithms bootstrap from the value function for horizon $h-1$, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as "the deadly triad"). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and $n$-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad.

preprint2020arXiv

Learning predictive representations in autonomous driving to improve deep reinforcement learning

Reinforcement learning using a novel predictive representation is applied to autonomous driving to accomplish the task of driving between lane markings where substantial benefits in performance and generalization are observed on unseen test roads in both simulation and on a real Jackal robot. The novel predictive representation is learned by general value functions (GVFs) to provide out-of-policy, or counter-factual, predictions of future lane centeredness and road angle that form a compact representation of the state of the agent improving learning in both online and offline reinforcement learning to learn to drive an autonomous vehicle with methods that generalizes well to roads not in the training data. Experiments in both simulation and the real-world demonstrate that predictive representations in reinforcement learning improve learning efficiency, smoothness of control and generalization to roads that the agent was never shown during training, including damaged lane markings. It was found that learning a predictive representation that consists of several predictions over different time scales, or discount factors, improves the performance and smoothness of the control substantially. The Jackal robot was trained in a two step process where the predictive representation is learned first followed by a batch reinforcement learning algorithm (BCQ) from data collected through both automated and human-guided exploration in the environment. We conclude that out-of-policy predictive representations with GVFs offer reinforcement learning many benefits in real-world problems.

preprint2020arXiv

LISPR: An Options Framework for Policy Reuse with Reinforcement Learning

We propose a framework for transferring any existing policy from a potentially unknown source MDP to a target MDP. This framework (1) enables reuse in the target domain of any form of source policy, including classical controllers, heuristic policies, or deep neural network-based policies, (2) attains optimality under suitable theoretical conditions, and (3) guarantees improvement over the source policy in the target MDP. These are achieved by packaging the source policy as a black-box option in the target MDP and providing a theoretically grounded way to learn the option's initiation set through general value functions. Our approach facilitates the learning of new policies by (1) maximizing the target MDP reward with the help of the black-box option, and (2) returning the agent to states in the learned initiation set of the black-box option where it is already optimal. We show that these two variants are equivalent in performance under some conditions. Through a series of experiments in simulated environments, we demonstrate that our framework performs excellently in sparse reward problems given (sub-)optimal source policies and improves upon prior art in transfer methods such as continual learning and progressive networks, which lack our framework's desirable theoretical properties.

preprint2020arXiv

Perception as prediction using general value functions in autonomous driving applications

We propose and demonstrate a framework called perception as prediction for autonomous driving that uses general value functions (GVFs) to learn predictions. Perception as prediction learns data-driven predictions relating to the impact of actions on the agent's perception of the world. It also provides a data-driven approach to predict the impact of the anticipated behavior of other agents on the world without explicitly learning their policy or intentions. We demonstrate perception as prediction by learning to predict an agent's front safety and rear safety with GVFs, which encapsulate anticipation of the behavior of the vehicle in front and in the rear, respectively. The safety predictions are learned through random interactions in a simulated environment containing other agents. We show that these predictions can be used to produce similar control behavior to an LQR-based controller in an adaptive cruise control problem as well as provide advanced warning when the vehicle behind is approaching dangerously. The predictions are compact policy-based predictions that support prediction of the long term impact on safety when following a given policy. We analyze two controllers that use the learned predictions in a racing simulator to understand the value of the predictions and demonstrate their use in the real-world on a Clearpath Jackal robot and an autonomous vehicle platform.