arXiv 2509.14252

LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

By Hai Huang, Yann LeCun, et al.

Published 2025-09-11

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up a natural question:…

View the original paper on arXiv