Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing

2019-08-01WS 2019Code Available0· sign in to hype

Anastasia Shimorina, Elena Khasanova, Claire Gardent

Code Available — Be the first to reproduce this paper.

Code

gitlab.com/shimorina/bsnlp-2019
OfficialIn papertf★ 0

Abstract

In this paper, we propose an approach for semi-automatically creating a data-to-text (D2T) corpus for Russian that can be used to learn a D2T natural language generation model. An error analysis of the output of an English-to-Russian neural machine translation system shows that 80\% of the automatically translated sentences contain an error and that 53\% of all translation errors bear on named entities (NE). We therefore focus on named entities and introduce two post-editing techniques for correcting wrongly translated NEs.

Tasks

Data-to-Text Generation Machine Translation Text Generation Translation

Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing

Code

Abstract

Tasks

Reproductions