MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability - Citegraph

Paper Info

Title
MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability

Abstract
The management and analysis of big data has been identified as one of the most important emerging needs in recent years. This is because of the sheer volume and increasing complexity of data being created or collected. Current clustering algorithms can not handle big data, and therefore, scalable solutions are necessary. Since fuzzy clustering algorithms have shown to outperform hard clustering approaches in terms of accuracy, this paper investigates the parallelization and scalability of a common and effective fuzzy clustering algorithm named fuzzy c-means (FCM) algorithm. The algorithm is parallelized using the MapReduce paradigm outlining how the Map and Reduce primitives are implemented. A validity analysis is conducted in order to show that the implementation works correctly achieving competitive purity results compared to state-of-the art clustering algorithms. Furthermore, a scalability analysis is conducted to demonstrate the performance of the parallel FCM implementation with increasing number of computing nodes used.

Year	DOI	Venue
2015	10.1007/s13042-015-0367-0	Int. J. Machine Learning & Cybernetics
Keywords	Field	DocType
MapReduce, Hadoop, Scalability	Fuzzy clustering,Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Data stream clustering,Computer science,Fuzzy logic,Cluster analysis,Big data,Scalability	Journal
Volume	Issue	ISSN
6	6	1868-808X
Citations	PageRank	References
35	0.96	21
Authors
1

Authors (1 rows)

Cited by (35 rows)

References (21 rows)

Name	Order	Citations	PageRank
Simone A Ludwig	1	1309	179.41

1