SOTAVerified

Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages

2021-04-01EACL (BSNLP) 2021Unverified0· sign in to hype

Marko Prelevikj, Slavko Zitnik

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper describes the University of Ljubljana (UL FRI) Group’s submissions to the shared task at the Balto-Slavic Natural Language Processing (BSNLP) 2021 Workshop. We experiment with multiple BERT-based models, pre-trained in multi-lingual, Croatian-Slovene-English and Slovene-only data. We perform training iteratively and on the concatenated data of previously available NER datasets. For the normalization task we use Stanza lemmatizer, while for entity matching we implemented a baseline using the Dedupe library. The performance of evaluations suggests that multi-source settings outperform less-resourced approaches. The best NER models achieve 0.91 F-score on Slovene training data splits while the best official submission achieved F-scores of 0.84 and 0.78 for relaxed partial matching and strict settings, respectively. In multi-lingual NER setting we achieve F-scores of 0.82 and 0.74.

Tasks

Reproductions