SOTAVerified

Noisy Parallel Corpus Filtering through Projected Word Embeddings

2019-08-01WS 2019Unverified0· sign in to hype

Murathan Kurfal{\i}, Robert {\"O}stling

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.

Tasks

Reproductions