Researcher profile

Eduardo Alonso

Eduardo Alonso contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Efficient entity-based reinforcement learning

Recent deep reinforcement learning (DRL) successes rely on end-to-end learning from fixed-size observational inputs (e.g. image, state-variables). However, many challenging and interesting problems in decision making involve observations or intermediary representations which are best described as a set of entities: either the image-based approach would miss small but important details in the observations (e.g. ojects on a radar, vehicles on satellite images, etc.), the number of sensed objects is not fixed (e.g. robotic manipulation), or the problem simply cannot be represented in a meaningful way as an image (e.g. power grid control, or logistics). This type of structured representations is not directly compatible with current DRL architectures, however, there has been an increase in machine learning techniques directly targeting structured information, potentially addressing this issue. We propose to combine recent advances in set representations with slot attention and graph neural networks to process structured data, broadening the range of applications of DRL algorithms. This approach allows to address entity-based problems in an efficient and scalable way. We show that it can improve training time and robustness significantly, and demonstrate their potential to handle structured as well as purely visual domains, on multiple environments from the Atari Learning Environment and Simple Playgrounds.

preprint2019arXiv

The Stabilisation of Equilibria in Evolutionary Game Dynamics through Mutation: Mutation Limits in Evolutionary Games

The multi-population replicator dynamics (RD) can be considered a dynamic approach to the study of multi-player games, where it was shown to be related to Cross' learning, as well as of systems of coevolving populations. However, not all of its equilibria are Nash equilibria (NE) of the underlying game, and neither convergence to an NE nor convergence in general are guaranteed. Although interior equilibria are guaranteed to be NE, no interior equilibrium can be asymptotically stable in the multi-population RD, resulting, e.g., in cyclic orbits around a single interior NE. We introduce a new notion of equilibria of RD, called mutation limits, which is based on the inclusion of a naturally arising, simple form of mutation, but is invariant under the specific choice of mutation parameters. We prove the existence of such mutation limits for a large range of games, and consider a subclass of particular interest, that of attracting mutation limits. Attracting mutation limits are approximated by asymptotically stable equilibria of the (mutation-)perturbed RD, and hence, offer an approximate dynamic solution of the underlying game, especially if the original dynamic has no asymptotically stable equilibria. In this sense, mutation stabilises the system in certain cases and makes attracting mutation limits near-attainable. Furthermore, the relevance of attracting mutation limits as a game theoretic equilibrium concept is emphasised by a similarity of (mutation-)perturbed RD to the Q-learning algorithm in the context of multi-agent reinforcement learning. In contrast to the guaranteed existence of mutation limits, attracting mutation limits do not exist in all games, raising the question of their characterization.

preprint2012arXiv

The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.

preprint2011arXiv

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning a value function by a smooth general function approximator, to solve a deterministic episodic control problem in a large continuous state space. It is shown that learning the gradient of the value-function at every point along a trajectory generated by a greedy policy is a sufficient condition for the trajectory to be locally extremal, and often locally optimal, and we argue that this brings greater efficiency to value-function learning. This contrasts to traditional value-function learning in which the value-function must be learnt over the whole of state space. It is also proven that policy-gradient learning applied to a greedy policy on a value-function produces a weight update equivalent to a value-gradient weight update, which provides a surprising connection between these two alternative paradigms of reinforcement learning, and a convergence proof for control problems with a value function represented by a general smooth function approximator.