Improving NER's Performance with Massive financial corpus

2020-07-31Code Available1· sign in to hype

Han Zhang

Code Available — Be the first to reproduce this paper.

Code

github.com/Hanlard/Electra_CRF_NER
none★ 81

Abstract

Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.

Tasks

Language Modeling Language Modelling

Improving NER's Performance with Massive financial corpus

Code

Abstract

Tasks

Reproductions