SOTAVerified

Applying Multilingual and Monolingual Transformer-Based Models for Dialect Identification

2020-12-01VarDial (COLING) 2020Unverified0· sign in to hype

Cristian Popa, Vlad Ștefănescu

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study the ability of large fine-tuned transformer models to solve a binary classification task of dialect identification, with a special interest in comparing the performance of multilingual to monolingual ones. The corpus analyzed contains Romanian and Moldavian samples from the news domain, as well as tweets for assessing the performance. We find that the monolingual models are superior to the multilingual ones and the best results are obtained using an SVM ensemble of 5 different transformer-based models. We provide our experimental results and an analysis of the attention mechanisms of the best-performing individual classifiers to explain their decisions. The code we used was released under an open-source license.

Tasks

Reproductions