SOTAVerified

Improving NER's Performance with Massive financial corpus

2020-07-31Code Available1· sign in to hype

Han Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.

Tasks

Reproductions