SOTAVerified

Morphology-Aware Meta-Embeddings for Tamil

2021-06-01NAACL 2021Code Available0· sign in to hype

Arjun Sai Krishnan, Seyoon Ragavan

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this work, we explore generating morphologically enhanced word embeddings for Tamil, a highly agglutinative South Indian language with rich morphology that remains low-resource with regards to NLP tasks. We present here the first-ever word analogy dataset for Tamil, consisting of 4499 hand-curated word tetrads across 10 semantic and 13 morphological relation types. Using a rules-based segmenter to capture morphology as well as meta-embedding techniques, we train meta-embeddings that outperform existing baselines by 16\% on our analogy task and appear to mitigate a previously observed trade-off between semantic and morphological accuracy.

Tasks

Reproductions