Bootstrapping a Romanian Corpus for Medical Named Entity Recognition
2017-09-01RANLP 2017Unverified0· sign in to hype
Maria Mitrofan
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Named Entity Recognition (NER) is an important component of natural language processing (NLP), with applicability in biomedical domain, enabling knowledge-discovery from medical texts. Due to the fact that for the Romanian language there are only a few linguistic resources specific to the biomedical domain, it was created a sub-corpus specific to this domain. In this paper we present a newly developed Romanian sub-corpus for medical-domain NER, which is a valuable asset for the field of biomedical text processing. We provide a description of the sub-corpus, informative statistics about data-composition and we evaluate an automatic NER tool on the newly created resource.