sCAKE: Semantic Connectivity Aware Keyword Extraction
Swagata Duari, Vasudha Bhatnagar
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/SDuari/sCAKE-and-LAKEOfficialIn papernone★ 0
- github.com/sduari/keyword-extraction-datasetsnone★ 0
- github.com/surbhim18/sCakenone★ 0
- github.com/lilaspourpre/kw_extractionnone★ 0
- github.com/SDuari/sCAKE-in-Pythonnone★ 0
Abstract
Keyword Extraction is an important task in several text analysis endeavors. In this paper, we present a critical discussion of the issues and challenges ingraph-based keyword extraction methods, along with comprehensive empirical analysis. We propose a parameterless method for constructing graph of text that captures the contextual relation between words. A novel word scoring method is also proposed based on the connection between concepts. We demonstrate that both proposals are individually superior to those followed by the state-of-the-art graph-based keyword extraction algorithms. Combination of the proposed graph construction and scoring methods leads to a novel, parameterless keyword extraction method (sCAKE) based on semantic connectivity of words in the document. Motivated by limited availability of NLP tools for several languages, we also design and present a language-agnostic keyword extraction (LAKE) method. We eliminate the need of NLP tools by using a statistical filter to identify candidate keywords before constructing the graph. We show that the resulting method is a competent solution for extracting keywords from documents oflanguages lacking sophisticated NLP support.