SOTAVerified

Adapting the TTL Romanian POS Tagger to the Biomedical Domain

2017-09-01RANLP 2017Unverified0· sign in to hype

Maria Mitrofan, Radu Ion

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents the adaptation of the Hidden Markov Models-based TTL part-of-speech tagger to the biomedical domain. TTL is a text processing platform that performs sentence splitting, tokenization, POS tagging, chunking and Named Entity Recognition (NER) for a number of languages, including Romanian. The POS tagging accuracy obtained by the TTL POS tagger exceeds 97\% when TTL's baseline model is updated with training information from a Romanian biomedical corpus. This corpus is developed in the context of the CoRoLa (a reference corpus for the contemporary Romanian language) project. Informative description and statistics of the Romanian biomedical corpus are also provided.

Tasks

Reproductions