Researcher profile

Antoine Cully

Antoine Cully contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization

A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single efficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in continuous control environments. The main contribution of this work is the introduction of a Diversity Policy Gradient (DPG) that exploits information at the time-step level to drive policies towards more diversity in a sample-efficient manner. Specifically, QDPG selects neural controllers from a MAP-Elites grid and uses two gradient-based mutation operators to improve both quality and diversity. Our results demonstrate that QDPG is significantly more sample-efficient than its evolutionary competitors.

preprint2022arXiv

Hierarchical Quality-Diversity for Online Damage Recovery

Adaptation capabilities, like damage recovery, are crucial for the deployment of robots in complex environments. Several works have demonstrated that using repertoires of pre-trained skills can enable robots to adapt to unforeseen mechanical damages in a few minutes. These adaptation capabilities are directly linked to the behavioural diversity in the repertoire. The more alternatives the robot has to execute a skill, the better are the chances that it can adapt to a new situation. However, solving complex tasks, like maze navigation, usually requires multiple different skills. Finding a large behavioural diversity for these multiple skills often leads to an intractable exponential growth of the number of required solutions. In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. The experiments with a hexapod robot show that our method solves maze navigation tasks with 20% less actions in the most challenging scenarios than the best baseline while having 57% less complete failures.

preprint2022arXiv

Learning to Walk Autonomously via Reset-Free Quality-Diversity

Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd.

preprint2022arXiv

Multi-Objective Quality Diversity Optimization

In this work, we consider the problem of Quality-Diversity (QD) optimization with multiple objectives. QD algorithms have been proposed to search for a large collection of both diverse and high-performing solutions instead of a single set of local optima. Thriving for diversity was shown to be useful in many industrial and robotics applications. On the other hand, most real-life problems exhibit several potentially antagonist objectives to be optimized. Hence being able to optimize for multiple objectives with an appropriate technique while thriving for diversity is important to many fields. Here, we propose an extension of the MAP-Elites algorithm in the multi-objective setting: Multi-Objective MAP-Elites (MOME). Namely, it combines the diversity inherited from the MAP-Elites grid algorithm with the strength of multi-objective optimizations by filling each cell with a Pareto Front. As such, it allows to extract diverse solutions in the descriptor space while exploring different compromises between objectives. We evaluate our method on several tasks, from standard optimization problems to robotics simulations. Our experimental evaluation shows the ability of MOME to provide diverse solutions while providing global performances similar to standard multi-objective algorithms.

preprint2022arXiv

Relevance-guided Unsupervised Discovery of Abilities with Quality-Diversity Algorithms

Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.

preprint2022arXiv

Unsupervised Behaviour Discovery with Quality-Diversity Optimisation

Quality-Diversity algorithms refer to a class of evolutionary algorithms designed to find a collection of diverse and high-performing solutions to a given problem. In robotics, such algorithms can be used for generating a collection of controllers covering most of the possible behaviours of a robot. To do so, these algorithms associate a behavioural descriptor to each of these behaviours. Each behavioural descriptor is used for estimating the novelty of one behaviour compared to the others. In most existing algorithms, the behavioural descriptor needs to be hand-coded, thus requiring prior knowledge about the task to solve. In this paper, we introduce: Autonomous Robots Realising their Abilities, an algorithm that uses a dimensionality reduction technique to automatically learn behavioural descriptors based on raw sensory data. The performance of this algorithm is assessed on three robotic tasks in simulation. The experimental results show that it performs similarly to traditional hand-coded approaches without the requirement to provide any hand-coded behavioural descriptor. In the collection of diverse and high-performing solutions, it also manages to find behaviours that are novel with respect to more features than its hand-coded baselines. Finally, we introduce a variant of the algorithm which is robust to the dimensionality of the behavioural descriptor space.

preprint2021arXiv

Dynamics-Aware Quality-Diversity for Efficient Learning of Skill Repertoires

Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millions of evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, we incrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 times more sample efficient than existing QD approaches for skill discovery. Second, we demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world. Videos and source code are available at: https://sites.google.com/view/da-qd.

preprint2021arXiv

On the Just-In-Time Discovery of Profit-Generating Transactions in DeFi Protocols

In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DEFIPOSER-ARB and then create logical DeFi protocol models for a theorem prover in DEFIPOSER-SMT. While DEFIPOSER-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DEFIPOSER-SMT can detect more complicated profitable transactions. We estimate that DEFIPOSER-ARB and DEFIPOSER-SMT can generate an average weekly revenue of 191.48ETH (76,592USD) and 72.44ETH (28,976USD) respectively, with the highest transaction revenue being 81.31ETH(32,524USD) and22.40ETH (8,960USD) respectively. We further show that DEFIPOSER-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets. Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DEFIPOSER-ARB and DEFIPOSER-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874x. Given optimal adversarial strategies provided by a Markov Decision Process (MDP), we quantify the value threshold at which a profitable transaction qualifies as Miner ExtractableValue (MEV) and would incentivize MEV-aware miners to fork the blockchain. For instance, we find that on Ethereum, a miner with a hash rate of 10% would fork the blockchain if an MEV opportunity exceeds 4x the block reward.

preprint2020arXiv

Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations

The increasing importance of robots and automation creates a demand for learnable controllers which can be obtained through various approaches such as Evolutionary Algorithms (EAs) or Reinforcement Learning (RL). Unfortunately, these two families of algorithms have mainly developed independently and there are only a few works comparing modern EAs with deep RL algorithms. We show that Multidimensional Archive of Phenotypic Elites (MAP-Elites), which is a modern EA, can deliver better-performing solutions than one of the state-of-the-art RL methods, Proximal Policy Optimization (PPO) in the generation of locomotion controllers for a simulated hexapod robot. Additionally, extensive hyper-parameter tuning shows that MAP-Elites displays greater robustness across seeds and hyper-parameter sets. Generally, this paper demonstrates that EAs combined with modern computational resources display promising characteristics and have the potential to contribute to the state-of-the-art in controller learning.