Compressing Transformer-Based Sequence to Sequence Models With Pre-trained Autoencoders for Text Summarization

2021-09-29Unverified0· sign in to hype

Ala Alam Falaki, Robin Gras

Unverified — Be the first to reproduce this paper.

Abstract

We proposed a technique to reduce the decoder’s number of parameters in a sequence to sequence (seq2seq) architecture for automatic text summarization. This approach uses a pre-trained AutoEncoder (AE) trained on top of a pre-trained encoder to reduce the encoder’s output dimension and allow to significantly reduce the size of the decoder. The ROUGE score is used to measure the effectiveness of this method by comparing four different latent space dimensionality reductions: 96%, 66%, 50%, 44%. A few well-known frozen pre-trained encoders (BART, BERT, and DistilBERT) have been tested, paired with the respective frozen pre-trained AEs to test the reduced dimension latent space’s ability to train a 3-layer transformer decoder. We also repeated the same experiments on a small transformer model that has been trained for text summarization. This study shows an increase of the R-1 score by 5% while reducing the model size by 44% using the DistilBERT encoder, and competitive scores for all the other models associated to important size reduction.

Tasks

Decoder Text Summarization

Compressing Transformer-Based Sequence to Sequence Models With Pre-trained Autoencoders for Text Summarization

Abstract

Tasks

Reproductions