A Parallel K-Medoids Algorithm for Clustering based on MapReduce - Citegraph

Paper Info

Title
A Parallel K-Medoids Algorithm for Clustering based on MapReduce

Abstract
One of the most important machine learning techniques include clustering of data into different clusters or categories. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data in the form of log events produced by such applications. This makes the task of clustering of huge amounts of data more challenging and limited. In this paper, we present our approach of a parallel K-Medoids clustering algorithm based on MapReduce paradigm to be able to perform clustering on large-scale of data. We have kept our solution simple and feasible to be used to handle huge volume, variety and velocity of data. Another key uniqueness in our proposed algorithm is that it can achieve parallelism independent of the number of k clusters to be formed, unlike other related approaches. We have tested our algorithm on large amounts of data and on a real-life case-study.

Year	DOI	Venue
2016	10.1109/ICMLA.2016.0089	2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
Keywords	Field	DocType
Clustering, K-Medoids, Big Data, MapReduce	Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Theoretical computer science,Artificial intelligence,Cluster analysis,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Affinity propagation,Algorithm,Constrained clustering,Machine learning	Conference
ISBN	Citations	PageRank
978-1-5090-6168-6	0	0.34
References	Authors
10	2

Authors (2 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
M. Omair Shafiq	1	139	18.59
Eric Torunski	2	0	0.68

1