SOTAVerified

Clustering Categorical Data: Soft Rounding k-modes

2022-10-18Code Available0· sign in to hype

Surya Teja Gavva, Karthik C. S., Sharath Punna

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis. Despite the proposal of various clustering algorithms, the classical k-modes algorithm remains a popular choice for unsupervised learning of categorical data. Surprisingly, our first insight is that in a natural generative block model, the k-modes algorithm performs poorly for a large range of parameters. We remedy this issue by proposing a soft rounding variant of the k-modes algorithm (SoftModes) and theoretically prove that our variant addresses the drawbacks of the k-modes algorithm in the generative model. Finally, we empirically verify that SoftModes performs well on both synthetic and real-world datasets.

Tasks

Reproductions