SOTAVerified

Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification

2019-08-01WS 2019Unverified0· sign in to hype

Bashar Talafha, Wael Farhan, Ahmed Altakrouri, Hussein Al-Natsheh

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84\% on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.

Tasks

Reproductions