Title
MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability
Abstract
The management and analysis of big data has been identified as one of the most important emerging needs in recent years. This is because of the sheer volume and increasing complexity of data being created or collected. Current clustering algorithms can not handle big data, and therefore, scalable solutions are necessary. Since fuzzy clustering algorithms have shown to outperform hard clustering approaches in terms of accuracy, this paper investigates the parallelization and scalability of a common and effective fuzzy clustering algorithm named fuzzy c-means (FCM) algorithm. The algorithm is parallelized using the MapReduce paradigm outlining how the Map and Reduce primitives are implemented. A validity analysis is conducted in order to show that the implementation works correctly achieving competitive purity results compared to state-of-the art clustering algorithms. Furthermore, a scalability analysis is conducted to demonstrate the performance of the parallel FCM implementation with increasing number of computing nodes used.
Year
DOI
Venue
2015
10.1007/s13042-015-0367-0
Int. J. Machine Learning & Cybernetics
Keywords
Field
DocType
MapReduce, Hadoop, Scalability
Fuzzy clustering,Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Data stream clustering,Computer science,Fuzzy logic,Cluster analysis,Big data,Scalability
Journal
Volume
Issue
ISSN
6
6
1868-808X
Citations 
PageRank 
References 
35
0.96
21
Authors
1
Name
Order
Citations
PageRank
Simone A Ludwig11309179.41