Title | ||
---|---|---|
Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances. |
Abstract | ||
---|---|---|
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $n$ input $d$-dimensional vectors under Hamming, $ell_1, ell_2$ and $ell_infty$ distances. All our algorithms run in $O(log n)$ rounds of MPC for any fixed $d$ and achieve $(1+epsilon)$-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for $o(log n)$-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on a variety of datasets exhibiting speedups of several orders of magnitude. |
Year | Venue | Field |
---|---|---|
2017 | international conference on machine learning | Discrete mathematics,Hamming code,Binary logarithm,Combinatorics,Spark (mathematics),Exact algorithm,Massively parallel,Hardness of approximation,Algorithm,Cluster analysis,Mathematics,Single-linkage clustering |
DocType | Volume | Citations |
Journal | abs/1710.01431 | 1 |
PageRank | References | Authors |
0.35 | 1 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Grigory Yaroslavtsev | 1 | 209 | 17.36 |
Adithya Vadapalli | 2 | 1 | 0.35 |