Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/iesl/dilated-cnn-nerOfficialIn papertf★ 0
- github.com/john-hewitt/conditional-probingpytorch★ 21
- github.com/zjuym/chinese_cws_nertf★ 0
- github.com/Tuofengalways/ee_Modelpytorch★ 0
Abstract
Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs. Recent advances in GPU hardware have led to the emergence of bi-directional LSTMs as a standard method for obtaining per-token vector representations serving as input to labeling tasks such as NER (often followed by prediction in a linear-chain CRF). Though expressive and accurate, these models fail to fully exploit GPU parallelism, limiting their computational efficiency. This paper proposes a faster alternative to Bi-LSTMs for NER: Iterated Dilated Convolutional Neural Networks (ID-CNNs), which have better capacity than traditional CNNs for large context and structured prediction. Unlike LSTMs whose sequential processing on sentences of length N requires O(N) time even in the face of parallelism, ID-CNNs permit fixed-depth convolutions to run in parallel across entire documents. We describe a distinct combination of network structure, parameter sharing and training procedures that enable dramatic 14-20x test-time speedups while retaining accuracy comparable to the Bi-LSTM-CRF. Moreover, ID-CNNs trained to aggregate context from the entire document are even more accurate while maintaining 8x faster test time speeds.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Ontonotes v5 (English) | BiLSTM-CRF | F1 | 86.99 | — | Unverified |
| Ontonotes v5 (English) | Iterated Dilated CNN | F1 | 86.84 | — | Unverified |