Local Search Yields a PTAS for k-Means in Doubling Metrics - Citegraph

Paper Info

Title
Local Search Yields a PTAS for k-Means in Doubling Metrics

Abstract
The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly k-MEANS: given a set of data points and a parameter k, select k centres and partition the data points into k clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie in Euclidean space R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</sup> for some d ≥ 2. k-MEANS and the first algorithms for it were introduced in the 1950's. Over the last six decades, hundreds of papers have studied this problem and different algorithms have been proposed for it. The most commonly used algorithm in practice is known as Lloyd-Forgy, which is also referred to as "the" k-MEANS algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a very simple local search heuristic to get a polynomial-time algorithm with approximation ratio 9 + ε for any fixed ε > 0 for k-Umeans in Euclidean space. Finding an algorithm with a better worst-case approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for k-MEANS for R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</sup> for any fixed d. More precisely, for any error parameter ε > 0, the local search algorithm that considers swaps of up to ρ = d <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O(d)</sup> · ε <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-O(d/ε)</sup> centres at a time will produce a solution using exactly k centres whose cost is at most a (1+ε)-factor greater than the optimum solution. Our analysis extends very easily to the more general settings where we want to minimize the sum of q'th powers of the distances between data points and their cluster centres (instead of sum of squares of distances as in k-MEANS) for any fixed q ≥ 1 and where the metric may not be Euclidean but still has fixed doubling dimension.

Year	DOI	Venue
2016	10.1109/FOCS.2016.47	2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
Keywords	DocType	Volume
#NAME?	Conference	abs/1603.08976
ISSN	ISBN	Citations
0272-5428	978-1-5090-3934-0	25
PageRank	References	Authors
0.85	36	3

Authors (3 rows)

Cited by (25 rows)

References (36 rows)

Name	Order	Citations	PageRank
Zachary Friggstad	1	133	15.66
Mohsen Rezapour	2	38	4.76
Mohammad R. Salavatipour	3	690	62.40

1