MHE: Code-Mixed Corpora for Similar Language Identification

2022-06-01LREC 2022Unverified0· sign in to hype

Priya Rani, John P. McCrae, Theodorus Fransen

Unverified — Be the first to reproduce this paper.

Abstract

This paper introduces a new Magahi-Hindi-English (MHE) code-mixed data-set for similar language identification (SMLID), where Magahi is a less-resourced minority language. This corpus provides a language id at two levels: word and sentence. This data-set is the first Magahi-Hindi-English code-mixed data-set for similar language identification task. Furthermore, we will discuss the complexity of the data-set and provide a few baselines for the language identification task.

Tasks

Language Identification Sentence

MHE: Code-Mixed Corpora for Similar Language Identification

Abstract

Tasks

Reproductions