SOTAVerified

Zero-shot North Korean to English Neural Machine Translation by Character Tokenization and Phoneme Decomposition

2020-07-01ACL 2020Unverified0· sign in to hype

Hwichan Kim, Tosho Hirasawa, Mamoru Komachi

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The primary limitation of North Korean to English translation is the lack of a parallel corpus; therefore, high translation accuracy cannot be achieved. To address this problem, we propose a zero-shot approach using South Korean data, which are remarkably similar to North Korean data. We train a neural machine translation model after tokenizing a South Korean text at the character level and decomposing characters into phonemes.We demonstrate that our method can effectively learn North Korean to English translation and improve the BLEU scores by +1.01 points in comparison with the baseline.

Tasks

Reproductions