SOTAVerified

Arabic Dialect Identification in Speech Transcripts

2016-12-01WS 2016Unverified0· sign in to hype

Shervin Malmasi, Marcos Zampieri

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus. We competed under the team name MAZA in the Arabic Dialect Identification sub-task of the 2016 Discriminating between Similar Languages (DSL) shared task. Our system achieved an F1-score of 0.51 in the closed training track, ranking first among the 18 teams that participated in the sub-task. Our system utilizes a classifier ensemble with a set of linear models as base classifiers. We experimented with three different ensemble fusion strategies, with the mean probability approach providing the best performance.

Tasks

Reproductions