arXiv 2509.25137

The Era of Real-World Human Interaction: RL from User Conversations

By Chuanyang Jin, Jing Xu, et al.

Published 2025-09-29

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two complementary method…

View the original paper on arXiv