Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

2021-05-04ACL (IWSLT) 2021Unverified0· sign in to hype

Toan Q. Nguyen, Kenton Murray, David Chiang

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for the improvement of about +1 BLEU across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

Tasks

Data Augmentation Diversity Low Resource Neural Machine Translation Low-Resource Neural Machine Translation Machine Translation Position Translation

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

Abstract

Tasks

Reproductions