Title
Keybin2: Distributed Clustering For Scalable And In-Situ Analysis
Abstract
We present KeyBin2, a key-based clustering method that is able to learn from distributed data in parallel. KeyBin2 uses random projections and discrete optimizations to efficiently clustering very high dimensional data. Because it is based on keys computed independently per dimension and per data point, KeyBin2 scales linearly. We perform accuracy and scalability tests to evaluate our algorithm's performance using synthetic and real datasets. The experiments show that KeyBin2 outperforms other parallel clustering methods for problems with increased complexity. Finally, we present an application of KeyBin2 for in-situ clustering of protein folding trajectories.
Year
DOI
Venue
2018
10.1145/3225058.3225149
PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING
Keywords
Field
DocType
Clustering, Random Projection, Map-Reduce, Privacy Preserving
Random projection,Clustering high-dimensional data,Computer science,Parallel computing,Algorithm,In situ analysis,Cluster analysis,Scalability
Conference
ISSN
Citations 
PageRank 
0190-3918
0
0.34
References 
Authors
33
5
Name
Order
Citations
PageRank
Xinyu Chen1297.43
Jeremy Benson201.35
Matt Peterson300.34
michela taufer435253.04
Trilce Estrada512018.27