SOTAVerified

Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts

2021-06-01NAACL (BioNLP) 2021Code Available0· sign in to hype

Yang Liu, Yuanhe Tian, Tsung-Hui Chang, Song Wu, Xiang Wan, Yan Song

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Chinese word segmentation (CWS) and medical concept recognition are two fundamental tasks to process Chinese electronic medical records (EMRs) and play important roles in downstream tasks for understanding Chinese EMRs. One challenge to these tasks is the lack of medical domain datasets with high-quality annotations, especially medical-related tags that reveal the characteristics of Chinese EMRs. In this paper, we collected a Chinese EMR corpus, namely, ACEMR, with human annotations for Chinese word segmentation and EMR-related tags. On the ACEMR corpus, we run well-known models (i.e., BiLSTM, BERT, and ZEN) and existing state-of-the-art systems (e.g., WMSeg and TwASP) for CWS and medical concept recognition. Experimental results demonstrate the necessity of building a dedicated medical dataset and show that models that leverage extra resources achieve the best performance for both tasks, which provides certain guidance for future studies on model selection in the medical domain.

Tasks

Reproductions