SOTAVerified

Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering

2018-10-01WS 2018Unverified0· sign in to hype

Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1\% and 10\% of high-quality data to be used to train machine translation systems. Seventeen participants from companies, national research labs, and universities participated in this task.

Tasks

Reproductions