Robust Lexical Features for Improved Neural Network Named-Entity Recognition

2018-06-09COLING 2018Code Available0· sign in to hype

Abbas Ghaddar, Philippe Langlais

Code Available — Be the first to reproduce this paper.

Code

github.com/ghaddarAbs/NER-with-LS
tf★ 0

Abstract

Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers. In this work, we show that this is unfair: lexical features are actually quite useful. We propose to embed words and entity types into a low-dimensional vector space we train from annotated data produced by distant supervision thanks to Wikipedia. From this, we compute - offline - a feature vector representing each word. When used with a vanilla recurrent neural network model, this representation yields substantial improvements. We establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while matching state-of-the-art performance with a F1 score of 91.73 on the over-studied CONLL-2003 dataset.

Tasks

named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CoNLL 2003 (English)	Bi-LSTM-CRF + Lexical Features	F1	91.73	—	Unverified
Ontonotes v5 (English)	Bi-LSTM-CRF + Lexical Features	F1	87.95	—	Unverified

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions