Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora
2022-11-01ROCLING 2022Unverified0· sign in to hype
Wen-Chao Yeh, Yu-Lun Hsieh, Yung-Chun Chang, Wen-Lian Hsu
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This study aims to evaluate three most popular word segmentation tool for a large Traditional Chinese corpus in terms of their efficiency, resource consumption, and cost. Specifically, we compare the performances of Jieba, CKIP, and MONPA on word segmentation, part-of-speech tagging and named entity recognition through extensive experiments. Experimental results show that MONPA using GPU for batch segmentation can greatly reduce the processing time of massive datasets. In addition, its features such as word segmentation, part-of-speech tagging, and named entity recognition are beneficial to downstream applications.