SOTAVerified

An Assessment of Language Identification Methods on Tweets and Wikipedia Articles

2020-07-01WS 2020Unverified0· sign in to hype

Pedro Vernetti, Larissa Freitas

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Language identification is the task of determining the language which a given text is written. This task is important for Natural Language Processing and Information Retrieval activities. Two popular approaches for language identification are the N-grams and stopwords models. In this paper, these two models were tested on different types of documents such as short, irregular texts (tweets) and long, regular texts (Wikipedia articles).

Tasks

Reproductions