TextMosaic: A New Data Augmentation Method for Named Entity Recognition Using Document-Level Contexts

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Named Entity Recognition (NER) often faces the problem of lacking massive and diverse annotation data, recent advances of pre-training techniques have shown great power on such low-resource tasks. However, the robustness of NER models is still insufficient which motivates us for efficient text enhancement method. Inspired by the mosaic augmentation method for object detection, this paper puts forward a novel data augmentation method named TextMosaic for NER through span sampling,over-sampling, and random sampling, which takes full consideration of the context-sensitive relevance. Meanwhile, sliding window is leveraged in the sampling to effectively capture rich document-level information and solve the problem of label imbalance. Our proposed method won the Top 1 in the robustness evaluation of CCIR Cup 2021. We also conduct extensive experiments on OntoNote 4.0 dataset, on which our method achieves higher accuracy and robustness for NER simultaneously. Besides, it consumes less computing resources and makes the model capable of running in 1080ti GPU efficiently. The code will be open-sourced on Github.

Tasks

Data Augmentation GPU named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)NER object-detection Object Detection

TextMosaic: A New Data Augmentation Method for Named Entity Recognition Using Document-Level Contexts

Abstract

Tasks

Reproductions