Multilingual Denoising Pre-training for Neural Machine Translation

2020-01-22Code Available1· sign in to hype

Yinhan Liu, Jiatao Gu, Naman Goyal, Xi-An Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

Code Available — Be the first to reproduce this paper.

Code

github.com/pytorch/fairseq/tree/master/examples/mbart
Officialpytorch★ 0
github.com/hyunwoongko/asian-bart
pytorch★ 69
github.com/avmb/marian-mbart
none★ 7
github.com/clmbrs/communication-translation
pytorch★ 5
github.com/evtaktasheva/dependency_extraction
pytorch★ 1
github.com/pwc-1/Paper-9/tree/main/2/mbart
mindspore★ 0

Abstract

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.

Tasks

Decoder Denoising Machine Translation Sentence Translation Unsupervised Machine Translation

Multilingual Denoising Pre-training for Neural Machine Translation

Code

Abstract

Tasks

Reproductions