An Empirical Evaluation of k-Means Coresets

2022-07-03Code Available0· sign in to hype

Chris Schwiegelshohn, Omar Ali Sheikh-Omar

Code Available — Be the first to reproduce this paper.

Code

github.com/sheikhomar/eval-k-means-coresets
OfficialIn papernone★ 2

Abstract

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as k-means in both theory and practice. Curiously, there exists no work on comparing the quality of available k-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

Tasks

Clustering

An Empirical Evaluation of k-Means Coresets

Code

Abstract

Tasks

Reproductions