On Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice. - Citegraph

Paper Info

Title
On Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice.

Abstract
We provide new analyses of Lloyd's algorithm (1982), commonly known as the k-means clustering algorithm. Kumar and Kannan (2010) showed that running k-SVD followed by a constant approximation k-means algorithm, and then Lloyd's algorithm, will correctly cluster nearly all of the dataset with respect to the optimal clustering, provided the dataset satisfies a deterministic clusterability assumption. This method is viewed as the "Swiss Army knife" for clustering problems, subsuming popular generative models such as Gaussian mixtures. However, it is tailored to high dimensional data, i.e., when d >> k. We analyze Lloyd's algorithm for general d without using the spectral projection, which leads to a weaker assumption in the case d < k. Surprisingly, we show that a simple and scalable heuristic that combines random sampling with Single-Linkage serves as a good seeding algorithm for Lloyd's algorithm under this assumption. We then study stopping criteria for Lloyd's algorithm under the lens of clusterability, accompanied by controlled simulations.

Year	Venue	Field
2016	JMLR Workshop and Conference Proceedings	Convergence (routing),Computer science,Performance guarantee,Lloyd's algorithm,Rate of convergence,Artificial intelligence,Cluster analysis,Machine learning
DocType	Volume	ISSN
Conference	51	1938-7288
Citations	PageRank	References
1	0.36	13
Authors
2

Authors (2 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Cheng Tang	1	4	1.46
Claire Monteleoni	2	327	24.15

1