SOTAVerified

BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian

2020-05-01LREC 2020Unverified0· sign in to hype

Aleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lind{\'e}n

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Akkadian is a fairly well resourced extinct language that does not yet have a comprehensive morphological analyzer available. In this paper we describe a general finite-state based morphological model for Babylonian, a southern dialect of the Akkadian language, that can achieve a coverage up to 97.3\% and recall up to 93.7\% on lemmatization and POS-tagging task on token level from a transcribed input. Since Akkadian word forms exhibit a high degree of morphological ambiguity, in that only 20.1\% of running word tokens receive a single unambiguous analysis, we attempt a first pass at weighting our finite-state transducer, using existing extensive Akkadian corpora which have been partially validated for their lemmas and parts-of-speech but not the entire morphological analyses. The resultant weighted finite-state transducer yields a moderate improvement so that for 57.4\% of the word tokens the highest ranked analysis is the correct one. We conclude with a short discussion on how morphological ambiguity in the analysis of Akkadian could be further reduced with improvements in the training data used in weighting the finite-state transducer as well as through other, context-based techniques.

Tasks

Reproductions