Very Deep Transformers for Neural Machine Translation

2020-08-18Code Available1· sign in to hype

Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

Code Available — Be the first to reproduce this paper.

Code

github.com/namisan/exdeep-nmt
OfficialIn papernone★ 32
github.com/LiyuanLucasLiu/Transformer-Clinic
pytorch★ 332
github.com/LiyuanLucasLiu/Transforemr-Clinic
pytorch★ 332
github.com/microsoft/deepnmt
pytorch★ 31

Abstract

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.

Tasks

Decoder Machine Translation NMT Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WMT2014 English-French	Transformer+BT (ADMIN init)	BLEU score	46.4	—	Unverified
WMT2014 English-French	Transformer (ADMIN init)	BLEU score	43.8	—	Unverified
WMT2014 English-German	Transformer (ADMIN init)	BLEU score	30.1	—	Unverified

Very Deep Transformers for Neural Machine Translation

Code

Abstract

Tasks

Benchmark Results

Reproductions