Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions

2021-10-01ROCLING 2021Unverified0· sign in to hype

Po-Han Chen, Yu-Xiang Zeng, Lung-Hao Lee

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked knowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.

Tasks

Multi-Label Classification MUlTI-LABEL-ClASSIFICATION

Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions

Abstract

Tasks

Reproductions