arXiv 2404.08471

Revisiting Feature Prediction for Learning Visual Representations from Video

By Adrien Bardes, Quentin Garrido, et al.

Published 2024-02-15

Citation lineage

Review the prior work and downstream research connected to this paper.

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluate…

View the original paper on arXiv