arXiv 1706.03741
Deep reinforcement learning from human preferences
By Paul Christiano, Jan Leike, et al.
Published 2017-06-12
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulate…