SOTAVerified

Normalizing Non-canonical Turkish Texts Using Machine Translation Approaches

2019-07-01ACL 2019Unverified0· sign in to hype

Talha {\c{C}}olako{\u{g}}lu, Umut Sulubacak, Ahmet C{\"u}neyd Tantu{\u{g}}

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a token level pipeline of modules, heavily dependent on external linguistic resources and manually defined rules. Instead, we propose a fully automated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.

Tasks

Reproductions