Fast and Accurate k-means++ via Rejection Sampling

2020-12-22NeurIPS 2020Unverified0· sign in to hype

Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson

Unverified — Be the first to reproduce this paper.

Abstract

k-means++ arthur2007k is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, k-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present a near linear time algorithm for k-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as k-means++ and significantly improves earlier results on fast k-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than k-means++ and obtains solutions of equivalent quality.

Tasks

Clustering

Fast and Accurate k-means++ via Rejection Sampling

Abstract

Tasks

Reproductions