arXiv 2410.11758
Latent Action Pretraining from Videos
By Seonghyeon Ye, Joel Jang, et al.
Published 2024-10-15
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human teleoperators during pretraining, which significantly limits possible data sources and scale. In this work, we propose a method to learn fr…