Researcher profile

Martí Sánchez-Fibla

Martí Sánchez-Fibla contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2019arXiv

Individual specialization in multi-task environments with multiagent reinforcement learners

There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as the first steps towards building general intelligent agents that learn to make low and high-level decisions in non-stationary complex environments in the presence of other agents. Previous results point us towards increased conditions for coordination, efficiency/fairness, and common-pool resource sharing. We further study coordination in multi-task environments where several rewarding tasks can be performed and thus agents don't necessarily need to perform well in all tasks, but under certain conditions may specialize. An observation derived from the study is that epsilon greedy exploration of value-based reinforcement learning methods is not adequate for multi-agent independent learners because the epsilon parameter that controls the probability of selecting a random action synchronizes the agents artificially and forces them to have deterministic policies at the same time. By using policy-based methods with independent entropy regularised exploration updates, we achieved a better and smoother convergence. Another result that needs to be further investigated is that with an increased number of agents specialization tends to be more probable.

preprint2019arXiv

Loss aversion fosters coordination among independent reinforcement learners

We study what are the factors that can accelerate the emergence of collaborative behaviours among independent selfish learning agents. We depart from the "Battle of the Exes" (BoE), a spatial repeated game from which human behavioral data has been obtained (by Hawkings and Goldstone, 2016) that we find interesting because it considers two cases: a classic game theory version, called ballistic, in which agents can only make one action/decision (equivalent to the Battle of the Sexes) and a spatial version, called dynamic, in which agents can change decision (a spatial continuous version). We model both versions of the game with independent reinforcement learning agents and we manipulate the reward function transforming it into an utility introducing "loss aversion": the reward that an agent obtains can be perceived as less valuable when compared to what the other got. We prove experimentally the introduction of loss aversion fosters cooperation by accelerating its appearance, and by making it possible in some cases like in the dynamic condition. We suggest that this may be an important factor explaining the rapid converge of human behaviour towards collaboration reported in the experiment of Hawkings and Goldstone.

preprint2019arXiv

Speeding up reinforcement learning by combining attention and agency features

When playing video-games we immediately detect which entity we control and we center the attention towards it to focus the learning and reduce its dimensionality. Reinforcement Learning (RL) has been able to deal with big state spaces, including states derived from pixel images in Atari games, but the learning is slow, depends on the brute force mapping from the global state to the action values (Q-function), thus its performance is severely affected by the dimensionality of the state and cannot be transferred to other games or other parts of the same game. We propose different transformations of the input state that combine attention and agency detection mechanisms which both have been addressed separately in RL but not together to our knowledge. We propose and benchmark different architectures including both global and local agency centered versions of the state and also including summaries of the surroundings. Results suggest that even a redundant global-local state network can learn faster than the global alone. Summarized versions of the state look promising to achieve input-size independence learning.