Abstract | ||
---|---|---|
The management and analysis of big data has been identified as one of the most important emerging needs in recent years. This is because of the sheer volume and increasing complexity of data being created or collected. Current clustering algorithms can not handle big data, and therefore, scalable solutions are necessary. Since fuzzy clustering algorithms have shown to outperform hard clustering approaches in terms of accuracy, this paper investigates the parallelization and scalability of a common and effective fuzzy clustering algorithm named fuzzy c-means (FCM) algorithm. The algorithm is parallelized using the MapReduce paradigm outlining how the Map and Reduce primitives are implemented. A validity analysis is conducted in order to show that the implementation works correctly achieving competitive purity results compared to state-of-the art clustering algorithms. Furthermore, a scalability analysis is conducted to demonstrate the performance of the parallel FCM implementation with increasing number of computing nodes used. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/s13042-015-0367-0 | Int. J. Machine Learning & Cybernetics |
Keywords | Field | DocType |
MapReduce, Hadoop, Scalability | Fuzzy clustering,Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Data stream clustering,Computer science,Fuzzy logic,Cluster analysis,Big data,Scalability | Journal |
Volume | Issue | ISSN |
6 | 6 | 1868-808X |
Citations | PageRank | References |
35 | 0.96 | 21 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Simone A Ludwig | 1 | 1309 | 179.41 |