arXiv 2512.16922

Next-Embedding Prediction Makes Strong Vision Learners

By Sihan Xu, Ziqiao Ma, et al.

Published 2025-12-18

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Inspired by the success of generative pretraining in natural language, we ask whether the same principles can yield strong self-supervised visual learners. Instead of training models to output features for downstream use, we train them to generate embeddings to perform predictive tasks directly. This work explores such a shift from learning representations to learning models. Specifically, models learn to predict fu…

View the original paper on arXiv