Source author record

Roberta Raileanu

Roberta Raileanu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning astro-ph.GA

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Exploration via Elliptical Episodic Bonuses

In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-based episodic term in their exploration bonus. As a result, despite their success in relatively simple, noise-free settings, these methods fall short in more realistic scenarios where the state space is vast and prone to noise. To address this limitation, we introduce Exploration via Elliptical Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses to continuous state spaces and encourages an agent to explore states that are diverse under a learned embedding within each episode. The embedding is learned using an inverse dynamics model in order to capture controllable aspects of the environment. Our method sets a new state-of-the-art across 16 challenging tasks from the MiniHack suite, without requiring task-specific inductive biases. E3B also matches existing methods on sparse reward, pixel-based VizDoom environments, and outperforms existing methods in reward-free exploration on Habitat, demonstrating that it can scale to high-dimensional pixel-based observations and realistic environments.

preprint2022arXiv

Backplay: "Man muss immer umkehren"

Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.

preprint2021arXiv

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios, even when they are trained on many instances of semantically similar environments. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. However, different tasks tend to benefit from different kinds of data augmentation. In this paper, we compare three approaches for automatically finding an appropriate augmentation. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for certain actor-critic algorithms. We evaluate our methods on the Procgen benchmark which consists of 16 procedurally-generated environments and show that it improves test performance by ~40% relative to standard RL algorithms. Our agent outperforms other baselines specifically designed to improve generalization in RL. In addition, we show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent, such as the background. Our implementation is available at https://github.com/rraileanu/auto-drac.

preprint2021arXiv

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.

preprint2020arXiv

Fast Adaptation via Policy-Dynamics Value Functions

Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training. PD-VF explicitly estimates the cumulative reward in a space of policies and environments. An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned. Then, a value function conditioned on both embeddings is trained. At test time, a few actions are sufficient to infer the environment embedding, enabling a policy to be selected by maximizing the learned value function (which requires no additional environment interaction). We show that our method can rapidly adapt to new dynamics on a set of MuJoCo domains. Code available at https://github.com/rraileanu/policy-dynamics-value-functions.

preprint2020arXiv

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state more than once. We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid, as well as on tasks with high-dimensional observations used in prior work. Our experiments demonstrate that this approach is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments. Furthermore, we analyze the learned behavior as well as the intrinsic reward received by our agent. In contrast to previous approaches, our intrinsic reward does not diminish during the course of training and it rewards the agent substantially more for interacting with objects that it can control.

preprint2016arXiv

Superbubbles in the Multiphase ISM and the Loading of Galactic Winds

We use numerical simulations to analyze the evolution and properties of superbubbles (SBs), driven by multiple supernovae (SNe), that propagate into the two-phase (warm/cold), cloudy interstellar medium (ISM). We consider a range of mean background densities n_avg=0.1-10 cm^{-3} and intervals between SNe dt_sn=0.01-1 Myr, and follow each SB until the radius reaches (1-2)H, where H is the characteristic ISM disk thickness. Except for embedded dense clouds, each SB is hot until a time t_sf,m when the shocked warm gas at the outer front cools and forms an overdense shell. Subsequently, diffuse gas in the SB interior remains at T_h 10^6-10^7K with expansion velocity v_h~10^2-10^3km/s (both highest for low dt_sn). At late times, the warm shell gas velocities are several 10's to ~100km/s. While shell velocities are too low to escape from a massive galaxy, they are high enough to remove substantial mass from dwarfs. Dense clouds are also accelerated, reaching a few to 10's of km/s. We measure the mass in hot gas per SN, M_h/N_SN, and the total radial momentum of the bubble per SN, p_b/N_SN. After t_sf,m, M_h/N_SN 10-100M_sun (highest for low n_avg), while p_b/N_SN 0.7-3x10^5M_sun km/s (highest for high dt_sn). If galactic winds in massive galaxies are loaded by the hot gas in SBs, we conclude that the mass-loss rates would generally be lower than star formation rates. Only if the SN cadence is much higher than typical in galactic disks, as may occur for nuclear starbursts, SBs can break out while hot and expel up to 10 times the mass locked up in stars. The momentum injection values, p_b/N_SN, are consistent with requirements to control star formation rates in galaxies at observed levels.

Roberta Raileanu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Exploration via Elliptical Episodic Bonuses

Backplay: "Man muss immer umkehren"

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

Fast Adaptation via Policy-Dynamics Value Functions

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Superbubbles in the Multiphase ISM and the Loading of Galactic Winds