arXiv 2312.14925

A Survey of Reinforcement Learning from Human Feedback

By Timo Kaufmann, Paul Weng, et al.

Published 2023-12-22

Citation lineage

Review the prior work and downstream research connected to this paper.

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance…

View the original paper on arXiv