SOTAVerified

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions

2019-08-01WS 2019Unverified0· sign in to hype

Philipp Koehn, Francisco Guzm{\'a}n, Vishrav Chaudhary, Juan Pino

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2\% and 10\% of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.

Tasks

Reproductions