Evaluating the Utility of Hand-crafted Features in Sequence Labelling

2018-08-28EMNLP 2018Code Available0· sign in to hype

Minghao Wu, Fei Liu, Trevor Cohn

Code Available — Be the first to reproduce this paper.

Code

github.com/minghao-wu/CRF-AE
OfficialIn paperpytorch★ 0

Abstract

Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a F_1 of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60\%, while retaining the same predictive accuracy.

Tasks

named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)NER

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CoNLL 2003 (English)	Neural-CRF+AE	F1	92.29	—	Unverified
CoNLL 2003 (English)	CRF + AutoEncoder	F1	91.87	—	Unverified

Evaluating the Utility of Hand-crafted Features in Sequence Labelling

Code

Abstract

Tasks

Benchmark Results

Reproductions