Contributions to Clinical Named Entity Recognition in Portuguese
F{\'a}bio Lopes, C{\'e}sar Teixeira, Hugo Gon{\c{c}}alo Oliveira
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/fabioacl/PortugueseClinicalNEROfficialIn papernone★ 16
Abstract
Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.