SOTAVerified

Word Embedding Evaluation for Sinhala

2020-05-01LREC 2020Unverified0· sign in to hype

Dimuthu Lakmal, Surangika Ranathunga, Saman Peramuna, Indu Herath

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents the first ever comprehensive evaluation of different types of word embeddings for Sinhala language. Three standard word embedding models, namely, Word2Vec (both Skipgram and CBOW), FastText, and Glove are evaluated under two types of evaluation methods: intrinsic evaluation and extrinsic evaluation. Word analogy and word relatedness evaluations were performed in terms of intrinsic evaluation, while sentiment analysis and part-of-speech (POS) tagging were conducted as the extrinsic evaluation tasks. Benchmark datasets used for intrinsic evaluations were carefully crafted considering specific linguistic features of Sinhala. In general, FastText word embeddings with 300 dimensions reported the finest accuracies across all the evaluation tasks, while Glove reported the lowest results.

Tasks

Reproductions