Title
On Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice.
Abstract
We provide new analyses of Lloyd's algorithm (1982), commonly known as the k-means clustering algorithm. Kumar and Kannan (2010) showed that running k-SVD followed by a constant approximation k-means algorithm, and then Lloyd's algorithm, will correctly cluster nearly all of the dataset with respect to the optimal clustering, provided the dataset satisfies a deterministic clusterability assumption. This method is viewed as the "Swiss Army knife" for clustering problems, subsuming popular generative models such as Gaussian mixtures. However, it is tailored to high dimensional data, i.e., when d >> k. We analyze Lloyd's algorithm for general d without using the spectral projection, which leads to a weaker assumption in the case d < k. Surprisingly, we show that a simple and scalable heuristic that combines random sampling with Single-Linkage serves as a good seeding algorithm for Lloyd's algorithm under this assumption. We then study stopping criteria for Lloyd's algorithm under the lens of clusterability, accompanied by controlled simulations.
Year
Venue
Field
2016
JMLR Workshop and Conference Proceedings
Convergence (routing),Computer science,Performance guarantee,Lloyd's algorithm,Rate of convergence,Artificial intelligence,Cluster analysis,Machine learning
DocType
Volume
ISSN
Conference
51
1938-7288
Citations 
PageRank 
References 
1
0.36
13
Authors
2
Name
Order
Citations
PageRank
Cheng Tang141.46
Claire Monteleoni232724.15