SOTAVerified

Tilde's Parallel Corpus Filtering Methods for WMT 2018

2018-10-01WS 2018Unverified0· sign in to hype

M{\=a}rcis Pinnis

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The paper describes parallel corpus filtering methods that allow reducing noise of noisy ``parallel'' corpora from a level where the corpora are not usable for neural machine translation training (i.e., the resulting systems fail to achieve reasonable translation quality; well below 10 BLEU points) up to a level where the trained systems show decent (over 20 BLEU points on a 10 million word dataset and up to 30 BLEU points on a 100 million word dataset). The paper also documents Tilde's submissions to the WMT 2018 shared task on parallel corpus filtering.

Tasks

Reproductions