SOTAVerified

Selecting the best data filtering method for NMT training

2021-08-01MTSummit 2021Unverified0· sign in to hype

Fred Bane, Anna Zaretskaya

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Performance of NMT systems has been proven to depend on the quality of the training data. In this paper we explore different open-source tools that can be used to score the quality of translation pairs, with the goal of obtaining clean corpora for training NMT models. We measure the performance of these tools by correlating their scores with human scores, as well as rank models trained on the resulting filtered datasets in terms of their performance on different test sets and MT performance metrics.

Tasks

Reproductions