arXiv 2507.11851

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

By Mohammad Samragh, Arnav Kundu, et al.

Published 2025-07-16

Discussion

Read the public discussion and references gathered around this paper.

Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future t…

View the original paper on arXiv