Automatic Syllabification for Manipuri language
Loitongbam Gyanendro Singh, Lenin Laitonjam, Sanasam Ranbir Singh
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project the syllabification problem as a sequence labeling problem and investigate its effect using various sequence labeling approaches. Third, we combine the effect of sequence labeling and rule-based method and investigate the performance of the hybrid approach. From various experimental observations, it is evident that the proposed methods outperform the baseline rule-based method. The entropy based phonotactic segmentation provides a word accuracy of 96\%, CRF (sequence labeling approach) provides 97\% and hybrid approach provides 98\% word accuracy.