Noisy Parallel Corpus Filtering through Projected Word Embeddings
2019-08-01WS 2019Unverified0· sign in to hype
Murathan Kurfal{\i}, Robert {\"O}stling
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.