SOTAVerified

Greedy bi-criteria approximations for k-medians and k-means

2016-07-21Unverified0· sign in to hype

Daniel Hsu, Matus Telgarsky

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of k-medians and k-means, the key results are as follows. When the method considers all data points as candidate centers, then selecting O(k(1/)) centers achieves cost at most 2+ times the optimal cost with k centers. Alternatively, the same guarantees hold if each round samples O(k/^5) candidate centers proportionally to their cluster cost (as with kmeans++, but holding centers fixed). In the case of k-means, considering an augmented set of n^1/ candidate centers gives 1+ approximation with O(k(1/)) centers, the entire algorithm taking O(dk(1/)n^1+1/) time, where n is the number of data points in R^d. In the case of Euclidean k-medians, generating a candidate set via n^O(1/^2) executions of stochastic gradient descent with adaptively determined constraint sets will once again give approximation 1+ with O(k(1/)) centers in dk(1/)n^O(1/^2) time. Ancillary results include: guarantees for cluster costs based on powers of metrics; a brief, favorable empirical evaluation against kmeans++; data-dependent bounds allowing 1+ in the first two bullets above, for example with k-medians over finite metric spaces.

Tasks

Reproductions