arXiv 2509.25137

The Era of Real-World Human Interaction: RL from User Conversations

By Chuanyang Jin, Jing Xu, et al.

Published 2025-09-29

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. Current conversational models are aligned using pre-annotated, expert-generated human feedback. In this work, we introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations. We develop two complementary method…

View the original paper on arXiv