Source author record

Remi Leblond

Remi Leblond appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning math.PR Multiagent Systems Neural and Evolutionary Computing

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Options as responses: Grounding behavioural hierarchies in multi-agent RL

This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn't seen during training. We propose two new games with concealed information and complex, non-transitive reward structure (think rock/paper/scissors). It turns out that most current deep reinforcement learning methods fail to efficiently explore the strategy space, thus learning policies that generalise poorly to unseen opponents. We then propose a novel hierarchical agent architecture, where the hierarchy is grounded in the game-theoretic structure of the game -- the top level chooses strategic responses to opponents, while the low level implements them into policy over primitive actions. This grounding facilitates credit assignment across the levels of hierarchy. Our experiments show that the proposed hierarchical agent is capable of generalisation to unseen opponents, while conventional baselines fail to generalise whatsoever.

preprint2011arXiv

Cutoff phenomenon for the simple exclusion process on the complete graph

We study the time that the simple exclusion process on the complete graph needs to reach equilibrium in terms of total variation distance. For the graph with n vertices and 1<<k<n/2 particles we show that the mixing time is of order (n/2)\log \min(k, \sqrt{n}), and that around this time, for any small positive epsilon the total variation distance drops from 1-epsilon to epsilon in a time window whose width is of order n (i.e. in a much shorter time). Our proof is purely probabilistic and self-contained.