TextMosaic: A New Data Augmentation Method for Named Entity Recognition Using Document-Level Contexts
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Named Entity Recognition (NER) often faces the problem of lacking massive and diverse annotation data, recent advances of pre-training techniques have shown great power on such low-resource tasks. However, the robustness of NER models is still insufficient which motivates us for efficient text enhancement method. Inspired by the mosaic augmentation method for object detection, this paper puts forward a novel data augmentation method named TextMosaic for NER through span sampling,over-sampling, and random sampling, which takes full consideration of the context-sensitive relevance. Meanwhile, sliding window is leveraged in the sampling to effectively capture rich document-level information and solve the problem of label imbalance. Our proposed method won the Top 1 in the robustness evaluation of CCIR Cup 2021. We also conduct extensive experiments on OntoNote 4.0 dataset, on which our method achieves higher accuracy and robustness for NER simultaneously. Besides, it consumes less computing resources and makes the model capable of running in 1080ti GPU efficiently. The code will be open-sourced on Github.