Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

2023-11-15arXiv 2023Code Available1· sign in to hype

Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/opennlg/openba-v2
pytorch★ 25
github.com/UniversalNER/uner_code
none★ 4

Abstract

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.

Tasks

Cross-Lingual NER Multilingual Named Entity Recognition named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)NER

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
UNER v1 (Chinese)	UNER XML-R	F1 (micro)	89.5	—	Unverified
UNER v1 (Chinese Simplified)	UNER XML-R	F1 (micro)	89.4	—	Unverified
UNER v1 (Croatian)	UNER XML-R	F1 (micro)	93.6	—	Unverified
UNER v1 (Danish)	UNER XML-R	F1 (micro)	82.7	—	Unverified
UNER v1 (English)	UNER XML-R	F1 (micro)	86	—	Unverified
UNER v1 (Portuguese)	UNER XML-R	F1 (micro)	90.4	—	Unverified
UNER v1 - PUD (Chinese)	UNER XML-R	F1 (micro)	87.1	—	Unverified
UNER v1 - PUD (English)	UNER XML-R	F1 (micro)	80.1	—	Unverified
UNER v1 - PUD (Portuguese)	UNER XML-R	F1 (micro)	88.8	—	Unverified
UNER v1 - PUD (Swedish)	UNER XML-R	F1 (micro)	82.2	—	Unverified
UNER v1 (Serbian)	UNER XML-R	F1 (micro)	94.7	—	Unverified
UNER v1 (Slovak)	UNER XML-R	F1 (micro)	85.5	—	Unverified
UNER v1 (Swedish)	UNER XML-R	F1 (micro)	88.3	—	Unverified

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Code

Abstract

Tasks

Benchmark Results

Reproductions