arXiv 2511.20633

Reinforcing Action Policies by Prophesying

By Jiahui Zhang, Ze Huang, et al.

Published 2025-11-25

Citation lineage

Review the prior work and downstream research connected to this paper.

Vision-Language-Action (VLA) policies excel in aligning language, perception, and robot control. However, most VLAs are trained purely by imitation, which overfits to demonstrations, and is brittle under distribution shift. Reinforcement learning (RL) directly optimizes task reward and thus addresses this misalignment, but real-robot interaction is expensive and conventional simulators are hard to engineer and trans…

View the original paper on arXiv