Title
Parallel Two-Phase K-Means
Abstract
In this paper, a new parallel version of Two-Phase K-means, called Parallel Two-Phase K-means (Par2PK-means), is introduced to overcome limits of available parallel versions. Par2PK-means is developed and executed on the MapReduce framework. It is divided into two phases. In the first phase, Mappers independently work on data segments to create an intermediate data. In the second phase, the intermediate data collected from Mappers are clustered by the Reducer to create the final clustering result. Testing on large data sets, the newly proposed algorithm attained a good speedup ratio, closing to the linearly speed-up ratio, when comparing to the sequential version Two-Phase K-means.
Year
DOI
Venue
2013
10.1007/978-3-642-39640-3_16
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V
Keywords
Field
DocType
Data Clustering, K-means, Parallel Distributed Computing, MapReduce
k-means clustering,Data set,Computer science,Parallel computing,Reducer,Cluster analysis,Speedup
Conference
Volume
ISSN
Citations 
7975
0302-9743
5
PageRank 
References 
Authors
0.45
4
3
Name
Order
Citations
PageRank
Cuong Nguyen120735.89
Dung Tien Nguyen2184.33
Van-Hau Pham3414.56