arXiv 2509.14252

LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

By Hai Huang, Yann LeCun, et al.

Published 2025-09-11

Discussion

Read the public discussion and references gathered around this paper.

Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up a natural question:…

View the original paper on arXiv