SOTAVerified

Contributions to Clinical Named Entity Recognition in Portuguese

2019-08-01WS 2019Code Available0· sign in to hype

F{\'a}bio Lopes, C{\'e}sar Teixeira, Hugo Gon{\c{c}}alo Oliveira

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.

Tasks

Reproductions