Startseite > Publikationen > Publikationen


Frahling, Gereon;Sohler, Christian:

A Fast k-Means Implementation Using Coresets.

In: ACM Symposium on Computational Geometry, S. 135-143, 2006


In this paper we develop an efficient implementation for a $k$-means clustering algorithm.
The novel feature of our algorithm is that it uses coresets to speed up the algorithm.
A coreset is a small weighted set of points that approximates the original point set with respect
to the considered problem. The main strength of the algorithm is that it can quickly
determine clusterings of the same point set for many values of $k$. This is necessary in many
applications, since, typically, one does not know a good value for $k$ in advance. Once we have clusterings
for many different values of $k$ we can determine a good choice of $k$ using a quality
measure of clusterings that is independent of $k$, for example the average silhouette coefficient.
The average silhouette coefficient can be approximated using coresets.

To evaluate the performance of our algorithm we compare it with algorithm KMHybrid cite-M05
on typical $3D$ data sets for an image compression application and on artificially created instances.
Our data sets consist of $300,000$ to $4.9$ million points.
We show that our algorithm significantly outperforms KMHybrid on most of these input instances.
Additionally, the quality of the solutions computed by our algorithm deviates less than that of KMHybrid.

We also computed clusterings and approximate average silhouette coefficient for $k=1,dots, 100$
for our input instances and discuss the performance of our algorithm in detail.




author = {Frahling, Gereon and Sohler, Christian},
title = {A Fast k-Means Implementation Using Coresets},
booktitle = {ACM Symposium on Computational Geometry},
pages = {135-143},
year = {2006},

BibTeX in die Zwischenablage kopieren