SOTAVerified

First Bilingual Word Embeddings for te reo Māori and English: Towards Code-switching Detection in a Low-resourced setting

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Māori speakers are bilingual, where Māori is code-switched with English. With Māori being low-resourced for technology development, there are minimal resources available for Māori-English code-switch detection. This research collected the Māori-English Words corpus containing more than 71M words, and developed the first open-sourced Māori-English bilingual word embeddings model. We provide experimental evidence to show an improvement of atleast 5% in F1-scores when the bilingual embeddings we developed is used for code-switch detection compared to the already-available English-only embeddings trained on a relatively large database. This study is the first one providing resources and exploring deep learning for Māori-English code-switch detection.

Tasks

Reproductions