Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

2022-03-30Code Available1· sign in to hype

Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui

Code Available — Be the first to reproduce this paper.

Code

github.com/hemingkx/specdec
OfficialIn paperpytorch★ 46
github.com/hemingkx/gad
OfficialIn paperpytorch★ 46

Abstract

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around 5 speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only 1.4 2 speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at https://github.com/hemingkx/SpecDec.

Tasks

Abstractive Text Summarization Machine Translation Translation

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

Code

Abstract

Tasks

Reproductions