SOTAVerified

Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure

2017-11-01IJCNLP 2017Unverified0· sign in to hype

Shen Huang, Xu sun, Houfeng Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance. Although various neural network methods for CWS have achieved performance competitive with state-of-the-art systems, these methods, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel BLSTM-based neural network model which incorporates a global recurrent structure designed for modeling boundary features dynamically. Experiments show that the proposed structure can effectively boost the performance of Chinese Word Segmentation, especially OOV-Recall, which brings benefits to domain adaptation. We achieved state-of-the-art results on 6 domains of CNKI articles, and competitive results to the best reported on the 4 domains of SIGHAN Bakeoff 2010 data.

Tasks

Reproductions