SOTAVerified

CLGP: Multi-Feature Embedding based Cross-Attention for Chinese NER

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The previous works fused lexicon information while ignoring two important Chinese language characteristics: glyph and pinyin, which carry significant syntax and semantics information for sequence tagging tasks. This paper proposes CLGP, which utilizes three specific extractors to obtain the embeddings of the glyph, pinyin, and lexicon, and further uses a network based on cross-attention to perform multi-feature embedding fusion. Specifically, we introduce the embedding scheme to preserve the lexicon matching results, and design two specific CNN architectures to extract glyph and pinyin embeddings. Moreover, we fuse the four embeddings by the cross-attention-based network to enhance the Chinese NER. The experimental results on four famous datasets show that CLGP achieves the SOTA performance.

Tasks

Reproductions