SOTAVerified

Comparing the Performance of CNNs and Shallow Models for Language Identification

2021-04-01EACL (VarDial) 2021Code Available0· sign in to hype

Andrea Ceolin

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this work we compare the performance of convolutional neural networks and shallow models on three out of the four language identification shared tasks proposed in the VarDial Evaluation Campaign 2021. In our experiments, convolutional neural networks and shallow models yielded comparable performance in the Romanian Dialect Identification (RDI) and the Dravidian Language Identification (DLI) shared tasks, after the training data was augmented, while an ensemble of support vector machines and Naïve Bayes models was the best performing model in the Uralic Language Identification (ULI) task. While the deep learning models did not achieve state-of-the-art performance at the tasks and tended to overfit the data, the ensemble method was one of two methods that beat the existing baseline for the first track of the ULI shared task.

Tasks

Reproductions