Vec2GC -- A Graph Based Clustering Method for Text Representations
2021-04-15Code Available0· sign in to hype
Rajesh N Rao, Manojit Chakraborty
Code Available — Be the first to reproduce this paper.
ReproduceCode
Abstract
NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.