arXiv 2511.20633

Reinforcing Action Policies by Prophesying

By Jiahui Zhang, Ze Huang, et al.

Published 2025-11-25

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Vision-Language-Action (VLA) policies excel in aligning language, perception, and robot control. However, most VLAs are trained purely by imitation, which overfits to demonstrations, and is brittle under distribution shift. Reinforcement learning (RL) directly optimizes task reward and thus addresses this misalignment, but real-robot interaction is expensive and conventional simulators are hard to engineer and trans…

View the original paper on arXiv