Title
Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances.
Abstract
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $n$ input $d$-dimensional vectors under Hamming, $ell_1, ell_2$ and $ell_infty$ distances. All our algorithms run in $O(log n)$ rounds of MPC for any fixed $d$ and achieve $(1+epsilon)$-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for $o(log n)$-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on a variety of datasets exhibiting speedups of several orders of magnitude.
Year
Venue
Field
2017
international conference on machine learning
Discrete mathematics,Hamming code,Binary logarithm,Combinatorics,Spark (mathematics),Exact algorithm,Massively parallel,Hardness of approximation,Algorithm,Cluster analysis,Mathematics,Single-linkage clustering
DocType
Volume
Citations 
Journal
abs/1710.01431
1
PageRank 
References 
Authors
0.35
1
2
Name
Order
Citations
PageRank
Grigory Yaroslavtsev120917.36
Adithya Vadapalli210.35