Data Augmentation for Cross-Domain Named Entity Recognition

2021-09-04EMNLP 2021Code Available1· sign in to hype

Shuguang Chen, Gustavo Aguilar, Leonardo Neves, Thamar Solorio

Code Available — Be the first to reproduce this paper.

Code

github.com/ritual-uh/style_ner
OfficialIn paperpytorch★ 20

Abstract

Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limited. In contrast, we study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains by projecting it into the low-resource domains. Specifically, we propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain by learning the patterns (e.g. style, noise, abbreviations, etc.) in the text that differentiate them and a shared feature space where both domains are aligned. We experiment with diverse datasets and show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.

Tasks

Cross-Domain Named Entity Recognition Data Augmentation named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)NER

Data Augmentation for Cross-Domain Named Entity Recognition

Code

Abstract

Tasks

Reproductions