arXiv 2512.16922
Next-Embedding Prediction Makes Strong Vision Learners
By Sihan Xu, Ziqiao Ma, et al.
Published 2025-12-18
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Inspired by the success of generative pretraining in natural language, we ask whether the same principles can yield strong self-supervised visual learners. Instead of training models to output features for downstream use, we train them to generate embeddings to perform predictive tasks directly. This work explores such a shift from learning representations to learning models. Specifically, models learn to predict fu…