arXiv 2511.20633
Reinforcing Action Policies by Prophesying
By Jiahui Zhang, Ze Huang, et al.
Published 2025-11-25
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Vision-Language-Action (VLA) policies excel in aligning language, perception, and robot control. However, most VLAs are trained purely by imitation, which overfits to demonstrations, and is brittle under distribution shift. Reinforcement learning (RL) directly optimizes task reward and thus addresses this misalignment, but real-robot interaction is expensive and conventional simulators are hard to engineer and trans…