arXiv 2506.09985
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
By Mido Assran, Adrien Bardes, et al.
Published 2025-06-11
Citation lineage
Review the prior work and downstream research connected to this paper.
A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-predictive architec…