Abstract | ||
---|---|---|
In this paper, a new parallel version of Two-Phase K-means, called Parallel Two-Phase K-means (Par2PK-means), is introduced to overcome limits of available parallel versions. Par2PK-means is developed and executed on the MapReduce framework. It is divided into two phases. In the first phase, Mappers independently work on data segments to create an intermediate data. In the second phase, the intermediate data collected from Mappers are clustered by the Reducer to create the final clustering result. Testing on large data sets, the newly proposed algorithm attained a good speedup ratio, closing to the linearly speed-up ratio, when comparing to the sequential version Two-Phase K-means. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/978-3-642-39640-3_16 | COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V |
Keywords | Field | DocType |
Data Clustering, K-means, Parallel Distributed Computing, MapReduce | k-means clustering,Data set,Computer science,Parallel computing,Reducer,Cluster analysis,Speedup | Conference |
Volume | ISSN | Citations |
7975 | 0302-9743 | 5 |
PageRank | References | Authors |
0.45 | 4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cuong Nguyen | 1 | 207 | 35.89 |
Dung Tien Nguyen | 2 | 18 | 4.33 |
Van-Hau Pham | 3 | 41 | 4.56 |