SOTAVerified

Evaluating Ensemble Based Pre-annotation on Named Entity Corpus Construction in English and Chinese

2016-12-01WS 2016Unverified0· sign in to hype

Tingming Lu, Man Zhu, Zhiqiang Gao, Yaocheng Gui

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Annotated corpora are crucial language resources, and pre-annotation is an usual way to reduce the cost of corpus construction. Ensemble based pre-annotation approach combines multiple existing named entity taggers and categorizes annotations into normal annotations with high confidence and candidate annotations with low confidence, to reduce the human annotation time. In this paper, we manually annotate three English datasets under various pre-annotation conditions, report the effects of ensemble based pre-annotation, and analyze the experimental results. In order to verify the effectiveness of ensemble based pre-annotation in other languages, such as Chinese, three Chinese datasets are also tested. The experimental results show that the ensemble based pre-annotation approach significantly reduces the number of annotations which human annotators have to add, and outperforms the baseline approaches in reduction of human annotation time without loss in annotation performance (in terms of F1-measure), on both English and Chinese datasets.

Tasks

Reproductions