How Should Markup Tags Be Translated?

2020-11-01WMT (EMNLP) 2020Code Available0· sign in to hype

Greg Hanneman, Georgiana Dinu

Code Available — Be the first to reproduce this paper.

Code

github.com/amazon-research/mt-markup-tags
OfficialIn papernone★ 4

Abstract

The ability of machine translation (MT) models to correctly place markup is crucial to generating high-quality translations of formatted input. This paper compares two commonly used methods of representing markup tags and tests the ability of MT models to learn tag placement via training data augmentation. We study the interactions of tag representation, data augmentation size, tag complexity, and language pair to show the drawbacks and benefits of each method. We construct and release new test sets containing tagged data for three language pairs of varying difficulty.

Tasks

Data Augmentation Machine Translation TAG Translation

How Should Markup Tags Be Translated?

Code

Abstract

Tasks

Reproductions