SOTAVerified

Quality and Coverage: The AFRL Submission to the WMT19 Parallel Corpus Filtering for Low-Resource Conditions Task

2019-08-01WS 2019Unverified0· sign in to hype

Grant Erdmann, Jeremy Gwinnup

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The WMT19 Parallel Corpus Filtering For Low-Resource Conditions Task aims to test various methods of filtering a noisy parallel corpora, to make them useful for training machine translation systems. This year the noisy corpora are the relatively low-resource language pairs of Nepali-English and Sinhala-English. This papers describes the Air Force Research Laboratory (AFRL) submissions, including preprocessing methods and scoring metrics. Numerical results indicate a benefit over baseline and the relative benefits of different options.

Tasks

Reproductions