Comparison of Representations of Named Entities for Document Classification
2018-07-01WS 2018Unverified0· sign in to hype
Lidia Pivovarova, Roman Yangarber
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.