arXiv 2410.11758

Latent Action Pretraining from Videos

By Seonghyeon Ye, Joel Jang, et al.

Published 2024-10-15

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human teleoperators during pretraining, which significantly limits possible data sources and scale. In this work, we propose a method to learn fr…

View the original paper on arXiv