Graph explorer

Implicit Temporal Differences

In reinforcement learning, the TD($λ$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($λ$) is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability. In this work, we introduce the implicit TD($λ$) algorithm which has the same function and computational cost as TD($λ$), but is significantly more stable. We provide a theoretical explanation of this stability and an empirical evaluation of implicit TD($λ$) on typical benchmark tasks. Our results show that implicit TD($λ$) outperforms standard TD($λ$) and a state-of-the-art method that automatically tunes the step-size, and thus shows promise for wide applicability.

6 nodes7 linksoverview mapImplicit Temporal Differences

Search Collections

PaperSignal 105 links