Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora

2018-07-26Code Available0· sign in to hype

Sajawel Ahmed, Alexander Mehler

Code Available — Be the first to reproduce this paper.

Code

github.com/FID-Biodiversity/GermanWordEmbeddings-NER
none★ 0

Abstract

This study improves the performance of neural named entity recognition by a margin of up to 11% in F-score on the example of a low-resource language like German, thereby outperforming existing baselines and establishing a new state-of-the-art on each single open-source dataset. Rather than designing deeper and wider hybrid neural architectures, we gather all available resources and perform a detailed optimization and grammar-dependent morphological processing consisting of lemmatization and part-of-speech tagging prior to exposing the raw data to any training process. We test our approach in a threefold monolingual experimental setup of a) single, b) joint, and c) optimized training and shed light on the dependency of downstream-tasks on the size of corpora used to compute word embeddings.

Tasks

Lemmatization named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)Part-Of-Speech Tagging Word Embeddings

Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora

Code

Abstract

Tasks

Reproductions