NICT Kyoto Submission for the WMT’21 Quality Estimation Task: Multimetric Multilingual Pretraining for Critical Error Detection

2021-11-01WMT (EMNLP) 2021Unverified0· sign in to hype

Raphael Rubino, Atsushi Fujita, Benjamin Marie

Unverified — Be the first to reproduce this paper.

Abstract

This paper presents the NICT Kyoto submission for the WMT’21 Quality Estimation (QE) Critical Error Detection shared task (Task 3). Our approach relies mainly on QE model pretraining for which we used 11 language pairs, three sentence-level and three word-level translation quality metrics. Starting from an XLM-R checkpoint, we perform continued training by modifying the learning objective, switching from masked language modeling to QE oriented signals, before finetuning and ensembling the models. Results obtained on the test set in terms of correlation coefficient and F-score show that automatic metrics and synthetic data perform well for pretraining, with our submissions ranked first for two out of four language pairs. A deeper look at the impact of each metric on the downstream task indicates higher performance for token oriented metrics, while an ablation study emphasizes the usefulness of conducting both self-supervised and QE pretraining.

Tasks

Language Modeling Language Modelling Masked Language Modeling Sentence Translation XLM-R

NICT Kyoto Submission for the WMT’21 Quality Estimation Task: Multimetric Multilingual Pretraining for Critical Error Detection

Abstract

Tasks

Reproductions