The JHU Parallel Corpus Filtering Systems for WMT 2018

2018-10-01WS 2018Unverified0· sign in to hype

Huda Khayrallah, Hainan Xu, Philipp Koehn

Unverified — Be the first to reproduce this paper.

Abstract

This work describes our submission to the WMT18 Parallel Corpus Filtering shared task. We use a slightly modified version of the Zipporah Corpus Filtering toolkit (Xu and Koehn, 2017), which computes an adequacy score and a fluency score on a sentence pair, and use a weighted sum of the scores as the selection criteria. This work differs from Zipporah in that we experiment with using the noisy corpus to be filtered to compute the combination weights, and thus avoids generating synthetic data as in standard Zipporah.

Tasks

Language Modeling Language Modelling Machine Translation Outlier Detection Sentence

The JHU Parallel Corpus Filtering Systems for WMT 2018

Abstract

Tasks

Reproductions