Abstract | ||
---|---|---|
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the Lloyd's algorithm for the k-means problem. It was conjectured by Brunsch and Röglin [9] that k-means++ behaves well for datasets with small dimension. More specifically, they conjectured that the k-means++ seeding algorithm gives O(logd) approximation with high probability for any d-dimensional dataset. In this work, we refute this conjecture by giving two dimensional datasets on which the k-means++ seeding algorithm achieves an O(logk) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [12] and by Brunsch and Röglin [9]. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1016/j.tcs.2016.04.012 | Theoretical Computer Science |
Keywords | Field | DocType |
k-means++,Lower bounds | Discrete mathematics,k-means clustering,Combinatorics,Upper and lower bounds,Conjecture,Seeding,Mathematics | Journal |
Volume | Issue | ISSN |
634 | C | 0304-3975 |
Citations | PageRank | References |
0 | 0.34 | 10 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anup Bhattacharya | 1 | 10 | 4.27 |
Ragesh Jaiswal | 2 | 220 | 18.33 |
Nir Ailon | 3 | 1114 | 70.74 |