arXiv 2507.09087

Deep Reinforcement Learning with Gradient Eligibility Traces

By Esraa Elelimy, Brett Daley, et al.

Published 2025-07-12

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Achieving fast and stable off-policy learning in deep reinforcement learning (RL) is challenging. Most existing methods rely on semi-gradient temporal-difference (TD) methods for their simplicity and efficiency, but are consequently susceptible to divergence. While more principled approaches like Gradient TD (GTD) methods have strong convergence guarantees, they have rarely been used in deep RL. Recent work introduc…

View the original paper on arXiv