SOTAVerified

Unsupervised Chinese Word Segmentation with BERT Oriented Probing and Transformation

2022-05-01Findings (ACL) 2022Code Available0· sign in to hype

Wei Li, Yuhan Song, Qi Su, Yanqiu Shao

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Word Segmentation is a fundamental step for understanding Chinese language. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploits shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas because of its ability to capture the deep contextual semantic relation. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves state-of-the-art F1 score on two CWS benchmark datasets.

Tasks

Reproductions