HMMs for Unsupervised Vietnamese WordSegmentation
2019-05-16Code Available0· sign in to hype
Ba-Long Bui, Thi-Trang Nguyen, Huu-Hoang Nguyen, Kiem-Hieu Nguyen
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/longbb/word_recognitionIn papernone★ 0
Abstract
Word segmentation is an important problem in nat-ural language processing. Most of previous works on Vietnameseword segmentation are supervised learning. In this paper, wepropose an unsupervised method for Vietnamese word segmenta-tion based on Hidden Markov Models. We naturally encode priorlinguistic knowledge into model learning. In decoding, we proposean enhancement of Viterbi decoding algorithm with externaltoken ordering statistics from Pointwise Mutual Information.Evaluation on benchmark datasets shows that the proposedmethod works reasonably well. Sourcecode is available at https://github.com/longbb/wordrecognition