Experiments on a Guarani Corpus of News and Social Media

2021-06-01NAACL (AmericasNLP) 2021Unverified0· sign in to hype

Santiago Góngora, Nicolás Giossa, Luis Chiruzzo

Unverified — Be the first to reproduce this paper.

Abstract

While Guarani is widely spoken in South America, obtaining a large amount of Guarani text from the web is hard. We present the building process of a Guarani corpus composed of a parallel Guarani-Spanish set of news articles, and a monolingual set of tweets. We perform some word embeddings experiments aiming at evaluating the quality of the Guarani split of the corpus, finding encouraging results but noticing that more diversity in text domains might be needed for further improvements.

Tasks

Articles Diversity Word Embeddings

Experiments on a Guarani Corpus of News and Social Media

Abstract

Tasks

Reproductions