Near-Optimal Explainable k-Means for All Dimensions

2021-06-29Unverified0· sign in to hype

Moses Charikar, Lunjia Hu

Unverified — Be the first to reproduce this paper.

Abstract

Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k^1 - 2/d\,poly(d k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k,d 2. Taking the minimum of this bound and the k\,polylog (k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k^1 - 2/d\,polylog(k), which we show is optimal for every choice of k,d 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O( k k) bound, improving near-exponentially over the previous best bound of O(k k) by Laber and Murtinho (ICML 2021).

Tasks

All Clustering

Near-Optimal Explainable k-Means for All Dimensions

Abstract

Tasks

Reproductions