Learning Dictionaries for Named Entity Recognition using Minimal Supervision
2015-04-24EACL 2014Unverified0· sign in to hype
Arvind Neelakantan, Michael Collins
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper describes an approach for automatic construction of dictionaries for Named Entity Recognition (NER) using large amounts of unlabeled data and a few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower dimensional embeddings (representations) for candidate phrases and classify these phrases using a small number of labeled examples. Our method achieves 16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER respectively. We also show that by adding candidate phrase embeddings as features in a sequence tagger gives better performance compared to using word embeddings.