arXiv 2512.16922

Next-Embedding Prediction Makes Strong Vision Learners

By Sihan Xu, Ziqiao Ma, et al.

Published 2025-12-18

Citation lineage

Review the prior work and downstream research connected to this paper.

Inspired by the success of generative pretraining in natural language, we ask whether the same principles can yield strong self-supervised visual learners. Instead of training models to output features for downstream use, we train them to generate embeddings to perform predictive tasks directly. This work explores such a shift from learning representations to learning models. Specifically, models learn to predict fu…

View the original paper on arXiv