SOTAVerified

Question Answering Classification for Amharic Social Media Community Based Questions

2022-06-01SIGUL (LREC) 2022Code Available1· sign in to hype

Tadesse Destaw, Seid Muhie Yimam, Abinew Ayele, Chris Biemann

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this work, we build a Question Answering (QA) classification dataset from a social media platform, namely the Telegram public channel called @AskAnythingEthiopia. The channel has more than 78k subscribers and has existed since May 31, 2019. The platform allows asking questions that belong to various domains, like politics, economics, health, education, and so on. Since the questions are posed in a mixed-code, we apply different strategies to pre-process the dataset. Questions are posted in Amharic, English, or Amharic but in a Latin script. As part of the pre-processing tools, we build a Latin to Ethiopic Script transliteration tool. We collect 8k Amharic and 24K transliterated questions and develop deep learning-based questions answering classifiers that attain as high as an F-score of 57.29 in 20 different question classes or categories. The datasets and pre-processing scripts are open-sourced to facilitate further research on the Amharic community-based question answering.

Tasks

Reproductions