SOTAVerified

Bilingual dictionaries for all EU languages

2014-05-01LREC 2014Code Available0· sign in to hype

Ahmet Aker, Monica Paramita, M{\=a}rcis Pinnis, Robert Gaizauskas

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Bilingual dictionaries can be automatically generated using the GIZA++ tool. However, these dictionaries contain a lot of noise, because of which the quality of outputs of tools relying on the dictionaries are negatively affected. In this work we present three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and translation based approach. We have applied these approaches on the GIZA++ dictionaries -- dictionaries covering official EU languages -- in order to remove noise. Our evaluation showed that all methods help to reduce noise. However, the best performance is achieved using the transliteration based approach. We provide all bilingual dictionaries (the original GIZA++ dictionaries and the cleaned ones) free for download. We also provide the cleaning tools and scripts for free download.

Tasks

Reproductions