arXiv 1706.03741
Deep reinforcement learning from human preferences
By Paul Christiano, Jan Leike, et al.
Published 2017-06-12
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulate…