Abstract | ||
---|---|---|
We present KeyBin2, a key-based clustering method that is able to learn from distributed data in parallel. KeyBin2 uses random projections and discrete optimizations to efficiently clustering very high dimensional data. Because it is based on keys computed independently per dimension and per data point, KeyBin2 scales linearly. We perform accuracy and scalability tests to evaluate our algorithm's performance using synthetic and real datasets. The experiments show that KeyBin2 outperforms other parallel clustering methods for problems with increased complexity. Finally, we present an application of KeyBin2 for in-situ clustering of protein folding trajectories. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3225058.3225149 | PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING |
Keywords | Field | DocType |
Clustering, Random Projection, Map-Reduce, Privacy Preserving | Random projection,Clustering high-dimensional data,Computer science,Parallel computing,Algorithm,In situ analysis,Cluster analysis,Scalability | Conference |
ISSN | Citations | PageRank |
0190-3918 | 0 | 0.34 |
References | Authors | |
33 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xinyu Chen | 1 | 29 | 7.43 |
Jeremy Benson | 2 | 0 | 1.35 |
Matt Peterson | 3 | 0 | 0.34 |
michela taufer | 4 | 352 | 53.04 |
Trilce Estrada | 5 | 120 | 18.27 |