arXiv 2409.12917

Training Language Models to Self-Correct via Reinforcement Learning

By Aviral Kumar, Vincent Zhuang, et al.

Published 2024-09-19

Discussion

Read the public discussion and references gathered around this paper.

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that…

View the original paper on arXiv