Tuning Multilingual Transformers for Named Entity Recognition on Slavic Languages

2019-01-30Conference: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing 2019Code Available0· sign in to hype

Mikhail Arkhipov, Maria Trofimova, Yuri Kuratov, Alexey Sorokin

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/deepmipt/Slavic-BERT-NER
In papertf★ 0

Abstract

Our paper addresses the problem of multilingual named entity recognition on the material of 4 languages: Russian, Bulgarian, Czech and Polish. We solve this task using the BERT model. We use a hundred languages multilingual model as base for transfer to the mentioned Slavic languages. Unsupervised pre-training of the BERT model on these 4 languages allows to significantly outperform baseline neural approaches and multilingual BERT. Additional improvement is achieved by extending BERT with a word-level CRF layer. Our system was submitted to BSNLP 2019 Shared Task on Multilingual Named Entity Recognition and took the 1st place in 3 competition metrics out of 4 we participated in. We open-sourced NER models and BERT model pre-trained on the four Slavic languages.

Tasks

Multilingual Named Entity Recognition named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)NER Unsupervised Pre-training

Tuning Multilingual Transformers for Named Entity Recognition on Slavic Languages

Code

Abstract

Tasks

Reproductions