JHU System Description for the MADAR Arabic Dialect Identification Shared Task

2019-08-01WS 2019Unverified0· sign in to hype

Tom Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee

Unverified — Be the first to reproduce this paper.

Abstract

Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.

Tasks

Dialect Identification Language Modeling Language Modelling Word Embeddings

JHU System Description for the MADAR Arabic Dialect Identification Shared Task

Abstract

Tasks

Reproductions