Source author record

Jakub Łyskawa

Jakub Łyskawa appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning

Catalog footprint

What is connected

2works

1topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ACERAC: Efficient reinforcement learning in fine time discretization

One of the main goals of reinforcement learning (RL) is to provide a~way for physical machines to learn optimal behavior instead of being programmed. However, effective control of the machines usually requires fine time discretization. The most common RL methods apply independent random elements to each action, which is not suitable in that setting. It is not feasible because it causes the controlled system to jerk, and does not ensure sufficient exploration since a~single action is not long enough to create a~significant experience that could be translated into policy improvement. In our view these are the main obstacles that prevent application of RL in contemporary control systems. To address these pitfalls, in this paper we introduce an RL framework and adequate analytical tools for actions that may be stochastically dependent in subsequent time instances. We also introduce an RL algorithm that approximately optimizes a~policy that produces such actions. It applies experience replay to adjust likelihood of sequences of previous actions to optimize expected $n$-step returns the policy yields. The efficiency of this algorithm is verified against four other RL methods (CDAU, PPO, SAC, ACER) in four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) in diverse time discretization. The algorithm introduced here outperforms the competitors in most cases considered.

preprint2020arXiv

A framework for reinforcement learning with autocorrelated actions

The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.