arXiv 2404.08471
Revisiting Feature Prediction for Learning Visual Representations from Video
By Adrien Bardes, Quentin Garrido, et al.
Published 2024-02-15
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluate…