Data augmentation using back-translation for context-aware neural machine translation

2019-11-01WS 2019Unverified0· sign in to hype

Amane Sugiyama, Naoki Yoshinaga

Unverified — Be the first to reproduce this paper.

Abstract

A single sentence does not always convey information that is enough to translate it into other languages. Some target languages need to add or specialize words that are omitted or ambiguous in the source languages (e.g, zero pronouns in translating Japanese to English or epicene pronouns in translating English to French). To translate such ambiguous sentences, we need contexts beyond a single sentence, and have so far explored context-aware neural machine translation (NMT). However, a large amount of parallel corpora is not easily available to train accurate context-aware NMT models. In this study, we first obtain large-scale pseudo parallel corpora by back-translating monolingual data, and then investigate its impact on the translation accuracy of context-aware NMT models. We evaluated context-aware NMT models trained with small parallel corpora and the large-scale pseudo parallel corpora on English-Japanese and English-French datasets to demonstrate the large impact of the data augmentation for context-aware NMT models.

Tasks

Data Augmentation Machine Translation NMT Sentence Translation

Data augmentation using back-translation for context-aware neural machine translation

Abstract

Tasks

Reproductions