Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification

2022-01-16ACL ARR January 2022Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

In this work, we tackle the problem of the detection of translated texts from different angles. On top of addressing the classic task of machine translation detection, we investigate and find the presence of common patterns across different machine translation systems as well as different source languages. Then, we show that it is possible to identify the translation systems used to produce a translated text (F1-score 88.5\%) as well as the source language of the original text (F1-score 79\%).We assess our tasks using Books, a new dataset we built from scratch based on excerpts of novels and the well-known Europarl dataset.

Tasks

Language Identification Machine Translation Translation

Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification

Abstract

Tasks

Reproductions