Arabic Dialect Identification in Speech Transcripts

2016-12-01WS 2016Unverified0· sign in to hype

Shervin Malmasi, Marcos Zampieri

Unverified — Be the first to reproduce this paper.

Abstract

In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus. We competed under the team name MAZA in the Arabic Dialect Identification sub-task of the 2016 Discriminating between Similar Languages (DSL) shared task. Our system achieved an F1-score of 0.51 in the closed training track, ranking first among the 18 teams that participated in the sub-task. Our system utilizes a classifier ensemble with a set of linear models as base classifiers. We experimented with three different ensemble fusion strategies, with the mean probability approach providing the best performance.

Tasks

Dialect Identification Machine Translation

Arabic Dialect Identification in Speech Transcripts

Abstract

Tasks

Reproductions