arXiv 2507.11851
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
By Mohammad Samragh, Arnav Kundu, et al.
Published 2025-07-16
Discussion
Read the public discussion and references gathered around this paper.
Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future t…