SOTAVerified

MTWatch: A Tool for the Analysis of Noisy Parallel Data

2014-05-01LREC 2014Unverified0· sign in to hype

D, S apat, ipan, Declan Groves

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

State-of-the-art statistical machine translation (SMT) technique requires a good quality parallel data to build a translation model. The availability of large parallel corpora has rapidly increased over the past decade. However, often these newly developed parallel data contains contain significant noise. In this paper, we describe our approach for classifying good quality parallel sentence pairs from noisy parallel data. We use 10 different features within a Support Vector Machine (SVM)-based model for our classification task. We report a reasonably good classification accuracy and its positive effect on overall MT accuracy.

Tasks

Reproductions