Abstract | ||
---|---|---|
We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2395116.2395117 | Berkeley, CA |
Keywords | Field | DocType |
near-optimal clustering solution,high-dimensional data,Lloyd-type iteration,output quality,performance guarantee,data set,lloyd-type method,k-means problem,well-clusterable instance,approximation algorithm,similar guarantee,clustering quality | Discrete mathematics,Randomized algorithm,Approximation algorithm,k-means clustering,Heuristic,Computer science,Popularity,Theoretical computer science,Cluster analysis | Journal |
Volume | Issue | ISSN |
59 | 6 | 0272-5428 |
ISBN | Citations | PageRank |
0-7695-2720-5 | 96 | 8.63 |
References | Authors | |
44 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Rafail Ostrovsky | 1 | 8743 | 588.15 |
Yuval Rabani | 2 | 2265 | 274.98 |
Leonard J. Schulman | 3 | 1328 | 136.88 |
Chaitanya Swamy | 4 | 1139 | 82.64 |