arXiv 2603.03251

Speculative Speculative Decoding

By Tanishq Kumar, Tri Dao, et al.

Published 2026-03-03

Citation lineage

Review the prior work and downstream research connected to this paper.

Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce…

View the original paper on arXiv