An Assessment of Language Identification Methods on Tweets and Wikipedia Articles

2020-07-01WS 2020Unverified0· sign in to hype

Pedro Vernetti, Larissa Freitas

Unverified — Be the first to reproduce this paper.

Abstract

Language identification is the task of determining the language which a given text is written. This task is important for Natural Language Processing and Information Retrieval activities. Two popular approaches for language identification are the N-grams and stopwords models. In this paper, these two models were tested on different types of documents such as short, irregular texts (tweets) and long, regular texts (Wikipedia articles).

Tasks

Articles Information Retrieval Language Identification Retrieval

An Assessment of Language Identification Methods on Tweets and Wikipedia Articles

Abstract

Tasks

Reproductions