arXiv 2509.25137
The Era of Real-World Human Interaction: RL from User Conversations
By Chuanyang Jin, Jing Xu, et al.
Published 2025-09-29
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two complementary method…