SOTAVerified

Neural Word Segmentation with Rich Pretraining

2017-04-28ACL 2017Code Available0· sign in to hype

Jie Yang, Yue Zhang, Fei Dong

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks.

Tasks

Reproductions