arXiv 2509.25137
The Era of Real-World Human Interaction: RL from User Conversations
By Chuanyang Jin, Jing Xu, et al.
Published 2025-09-29
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two complementary method…