arXiv 2603.03251
Speculative Speculative Decoding
By Tanishq Kumar, Tri Dao, et al.
Published 2026-03-03
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce…