Knowledge Distillation based Ensemble Learning for Neural Machine Translation

2021-01-01Unverified0· sign in to hype

Chenze Shao, Meng Sun, Yang Feng, Zhongjun He, Hua Wu, Haifeng Wang

Unverified — Be the first to reproduce this paper.

Abstract

Model ensemble can effectively improve the accuracy of neural machine translation, which is accompanied by the cost of large computation and memory requirements. Additionally, model ensemble cannot combine the strengths of translation models with different decoding strategies since their translation probabilities cannot be directly aggregated. In this paper, we introduce an ensemble learning framework based on knowledge distillation to aggregate the knowledge of multiple teacher models into a single student model. Under this framework, we introduce word-level ensemble learning and sequence-level ensemble learning for neural machine translation, where sequence-level ensemble learning is capable of aggregating translation models with different decoding strategies. Experimental results on multiple translation tasks show that, by combining the two ensemble learning methods, our approach achieves substantial improvements over the competitive baseline systems and establishes a new single-model state-of-the-art BLEU score of 31.13 in the WMT14 English-German translation task.We will release the source code and the created SEL training data for reproducibility.

Tasks

Ensemble Learning Knowledge Distillation Machine Translation Translation

Knowledge Distillation based Ensemble Learning for Neural Machine Translation

Abstract

Tasks

Reproductions