arXiv 2507.09087

Deep Reinforcement Learning with Gradient Eligibility Traces

By Esraa Elelimy, Brett Daley, et al.

Published 2025-07-12

Citation lineage

Review the prior work and downstream research connected to this paper.

Achieving fast and stable off-policy learning in deep reinforcement learning (RL) is challenging. Most existing methods rely on semi-gradient temporal-difference (TD) methods for their simplicity and efficiency, but are consequently susceptible to divergence. While more principled approaches like Gradient TD (GTD) methods have strong convergence guarantees, they have rarely been used in deep RL. Recent work introduc…

View the original paper on arXiv