SOTAVerified

VarClass: An Open-source Language Identification Tool for Language Varieties

2014-05-01LREC 2014Unverified0· sign in to hype

Marcos Zampieri, Binyam Gebre

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents VarClass, an open-source tool for language identification available both to be downloaded as well as through a graphical user-friendly interface. The main difference of VarClass in comparison to other state-of-the-art language identification tools is its focus on language varieties. General purpose language identification tools do not take language varieties into account and our work aims to fill this gap. VarClass currently contains language models for over 27 languages in which 10 of them are language varieties. We report an average performance of over 90.5\% accuracy in a challenging dataset. More language models will be included in the upcoming months.

Tasks

Reproductions