Transformer on a Diet

2020-02-14Code Available1· sign in to hype

Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola

Code Available — Be the first to reproduce this paper.

Code

github.com/cgraywang/transformer-on-diet
mxnet★ 31

Abstract

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results. Experimental results on language model benchmark datasets hint that such trade-off is promising, and the light Transformer reduces 70% parameters at best, while obtains competitive perplexity compared to standard Transformer. The source code is publicly available.

Tasks

Language Modeling Language Modelling

Transformer on a Diet

Code

Abstract

Tasks

Reproductions