arXiv 2509.25137

The Era of Real-World Human Interaction: RL from User Conversations

By Chuanyang Jin, Jing Xu, et al.

Published 2025-09-29

Citation lineage

Review the prior work and downstream research connected to this paper.

We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two complementary method…

View the original paper on arXiv