Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction

2021-05-12ACL 2021Code Available1· sign in to hype

Gyubok Lee, Seongjun Yang, Edward Choi

Code Available — Be the first to reproduce this paper.

Code

github.com/wns823/NMT_SSP
OfficialIn paperpytorch★ 11

Abstract

Accurate terminology translation is crucial for ensuring the practicality and reliability of neural machine translation (NMT) systems. To address this, lexically constrained NMT explores various methods to ensure pre-specified words and phrases appear in the translation output. However, in many cases, those methods are studied on general domain corpora, where the terms are mostly uni- and bi-grams (>98%). In this paper, we instead tackle a more challenging setup consisting of domain-specific corpora with much longer n-gram and highly specialized terms. Inspired by the recent success of masked span prediction models, we propose a simple and effective training strategy that achieves consistent improvements on both terminology and sentence-level translation for three domain-specific corpora in two language pairs.

Tasks

Machine Translation NMT Sentence Translation

Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction

Code

Abstract

Tasks

Reproductions