Word Embeddings for the Armenian Language: Intrinsic and Extrinsic Evaluation

2019-06-07Code Available0· sign in to hype

Karen Avetisyan, Tsolak Ghukasyan

Code Available — Be the first to reproduce this paper.

Code

github.com/ispras-texterra/word-embeddings-eval-hy
OfficialIn papernone★ 0

Abstract

In this work, we intrinsically and extrinsically evaluate and compare existing word embedding models for the Armenian language. Alongside, new embeddings are presented, trained using GloVe, fastText, CBOW, SkipGram algorithms. We adapt and use the word analogy task in intrinsic evaluation of embeddings. For extrinsic evaluation, two tasks are employed: morphological tagging and text classification. Tagging is performed on a deep neural network, using ArmTDP v2.3 dataset. For text classification, we propose a corpus of news articles categorized into 7 classes. The datasets are made public to serve as benchmarks for future models.

Tasks

Articles Classification General Classification Morphological Tagging text-classification Text Classification Word Embeddings

Word Embeddings for the Armenian Language: Intrinsic and Extrinsic Evaluation

Code

Abstract

Tasks

Reproductions