arXiv 2410.11758

Latent Action Pretraining from Videos

By Seonghyeon Ye, Joel Jang, et al.

Published 2024-10-15

Citation lineage

Review the prior work and downstream research connected to this paper.

We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human teleoperators during pretraining, which significantly limits possible data sources and scale. In this work, we propose a method to learn fr…

View the original paper on arXiv