SOTAVerified

Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora

2022-11-01ROCLING 2022Unverified0· sign in to hype

Wen-Chao Yeh, Yu-Lun Hsieh, Yung-Chun Chang, Wen-Lian Hsu

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This study aims to evaluate three most popular word segmentation tool for a large Traditional Chinese corpus in terms of their efficiency, resource consumption, and cost. Specifically, we compare the performances of Jieba, CKIP, and MONPA on word segmentation, part-of-speech tagging and named entity recognition through extensive experiments. Experimental results show that MONPA using GPU for batch segmentation can greatly reduce the processing time of massive datasets. In addition, its features such as word segmentation, part-of-speech tagging, and named entity recognition are beneficial to downstream applications.

Tasks

Reproductions