arXiv 2511.20633
Reinforcing Action Policies by Prophesying
By Jiahui Zhang, Ze Huang, et al.
Published 2025-11-25
Citation lineage
Review the prior work and downstream research connected to this paper.
Vision-Language-Action (VLA) policies excel in aligning language, perception, and robot control. However, most VLAs are trained purely by imitation, which overfits to demonstrations, and is brittle under distribution shift. Reinforcement learning (RL) directly optimizes task reward and thus addresses this misalignment, but real-robot interaction is expensive and conventional simulators are hard to engineer and trans…