R-Drop: Regularized Dropout for Neural Networks

2021-06-28NeurIPS 2021Code Available1· sign in to hype

Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

Code Available — Be the first to reproduce this paper.

Code

github.com/dropreg/R-Drop
OfficialIn paperjax★ 881
github.com/cosmoquester/2021-dialogue-summary-competition
pytorch★ 128
github.com/bojone/r-drop
tf★ 91
github.com/fushengwuyu/R-Drop
pytorch★ 16
github.com/btobab/R-Drop
paddle★ 3
github.com/zbp-xxxp/R-Drop-Paddle
paddle★ 2
github.com/zpc-666/Paddle-R-Drop
paddle★ 2
github.com/wzh326/R-Drop
paddle★ 0

Abstract

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on 5 widely used deep learning tasks (18 datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 EnglishGerman translation (30.91 BLEU) and WMT14 EnglishFrench translation (43.95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHubhttps://github.com/dropreg/R-Drop.

Tasks

Abstractive Text Summarization image-classification Image Classification Language Modeling Language Modelling Machine Translation Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CNN / Daily Mail	BART + R-Drop	ROUGE-1	44.51	—	Unverified

R-Drop: Regularized Dropout for Neural Networks

Code

Abstract

Tasks

Benchmark Results

Reproductions