HausaMT v1.0: Towards English--Hausa Neural Machine Translation

2020-07-01WS 2020Code Available1· sign in to hype

Adewale Akinfaderin

Code Available — Be the first to reproduce this paper.

Code

github.com/WalePhenomenon/Hausa-NMT
none★ 16

Abstract

Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English--Hausa machine translation, which is considered a task for low--resource language. The Hausa language is the second largest Afro--Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa--English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder--decoder architecture with two tokenization approaches: standard word--level tokenization and Byte Pair Encoding (BPE) subword tokenization.

Tasks

Decoder Diversity Machine Translation NMT Translation

HausaMT v1.0: Towards English--Hausa Neural Machine Translation

Code

Abstract

Tasks

Reproductions