Comparison of Representations of Named Entities for Document Classification

2018-07-01WS 2018Unverified0· sign in to hype

Lidia Pivovarova, Roman Yangarber

Unverified — Be the first to reproduce this paper.

Abstract

We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.

Tasks

Classification Document Classification General Classification Representation Learning text-classification Text Classification Topic Classification Word Embeddings

Comparison of Representations of Named Entities for Document Classification

Abstract

Tasks

Reproductions