Researcher profile

Jordan Erskine

Jordan Erskine contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
1close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Developing cooperative policies for multi-stage reinforcement learning tasks

Many hierarchical reinforcement learning algorithms utilise a series of independent skills as a basis to solve tasks at a higher level of reasoning. These algorithms don't consider the value of using skills that are cooperative instead of independent. This paper proposes the Cooperative Consecutive Policies (CCP) method of enabling consecutive agents to cooperatively solve long time horizon multi-stage tasks. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising critics allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain and a peg in hole manipulation domain, the cooperative policies were able to outperform a set of naive policies, a single agent trained across the entire domain, as well as another sequential HRL algorithm.

preprint2020arXiv

Developing cooperative policies for multi-stage tasks

This paper proposes the Cooperative Soft Actor Critic (CSAC) method of enabling consecutive reinforcement learning agents to cooperatively solve a long time horizon multi-stage task. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising each agent's critic allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain, the cooperative policies were able to outperform both uncooperative policies as well as a single agent trained across the entire domain. CSAC achieved a success rate of at least 20\% higher than the uncooperative policies, and converged on a solution at least 4 times faster than the single agent.