BART-light: One Decoder Layer Is Enough

2021-09-17ACL ARR September 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

BART (Lewis et al., 2020), an encoder-decoder transformer language model (LM), has reached state-of-the-art results on several tasks in natural language generation and understanding. Similar to other pretrained encoder-decoder LMs, it uses the same number of hidden layers in the encoder and the decoder. In this paper, we show that one can easily remove all but one or two decoder layers for text generation tasks and even remove the whole decoder for classification tasks, with little to no compromises on performance. Our study presents that a shallow decoder is sufficient for most tasks when a deep encoder is used.

Tasks

Decoder Language Modeling Language Modelling Text Generation

BART-light: One Decoder Layer Is Enough

Abstract

Tasks

Reproductions