MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

2019-08-01WS 2019Unverified0· sign in to hype

Dhaou Ghoul, Ga{\"e}l Lejeune

Unverified — Be the first to reproduce this paper.

Abstract

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25\% with 1 =N =3 but showed a much better result with character 4-grams (62.17\% accuracy).

Tasks

Dialect Identification General Classification

MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

Abstract

Tasks

Reproductions