Complexity and Approximation of the Fuzzy K-Means Problem

2015-12-18Unverified0· sign in to hype

Johannes Blömer, Sascha Brauer, Kathrin Bujna

Unverified — Be the first to reproduce this paper.

Abstract

The fuzzy K-means problem is a generalization of the classical K-means problem to soft clusterings, i.e. clusterings where each points belongs to each cluster to some degree. Although popular in practice, prior to this work the fuzzy K-means problem has not been studied from a complexity theoretic or algorithmic perspective. We show that optimal solutions for fuzzy K-means cannot, in general, be expressed by radicals over the input points. Surprisingly, this already holds for very simple inputs in one-dimensional space. Hence, one cannot expect to compute optimal solutions exactly. We give the first (1+)-approximation algorithms for the fuzzy K-means problem. First, we present a deterministic approximation algorithm whose runtime is polynomial in N and linear in the dimension D of the input set, given that K is constant, i.e. a polynomial time approximation algorithm given a fixed K. We achieve this result by showing that for each soft clustering there exists a hard clustering with comparable properties. Second, by using techniques known from coreset constructions for the K-means problem, we develop a deterministic approximation algorithm that runs in time almost linear in N but exponential in the dimension D. We complement these results with a randomized algorithm which imposes some natural restrictions on the input set and whose runtime is comparable to some of the most efficient approximation algorithms for K-means, i.e. linear in the number of points and the dimension, but exponential in the number of clusters.

Tasks

Clustering

Complexity and Approximation of the Fuzzy K-Means Problem

Abstract

Tasks

Reproductions