arXiv 2603.03251
Speculative Speculative Decoding
By Tanishq Kumar, Tri Dao, et al.
Published 2026-03-03
Citation lineage
Review the prior work and downstream research connected to this paper.
Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce…