An Assessment of Language Identification Methods on Tweets and Wikipedia Articles
2020-07-01WS 2020Unverified0· sign in to hype
Pedro Vernetti, Larissa Freitas
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Language identification is the task of determining the language which a given text is written. This task is important for Natural Language Processing and Information Retrieval activities. Two popular approaches for language identification are the N-grams and stopwords models. In this paper, these two models were tested on different types of documents such as short, irregular texts (tweets) and long, regular texts (Wikipedia articles).