Title
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks
Abstract
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Year
DOI
Venue
2013
10.1016/j.jpdc.2012.09.009
J. Parallel Distrib. Comput.
Keywords
Field
DocType
communication path,comparative performance analysis,straightforward parallel algorithm,ideal centralised algorithm,asynchronous large-scale network,single communication failure,global communication,k-means algorithm,proposed algorithm,fault tolerant global communication,fault tolerant decentralised k-means,cluster analysis,gossip protocols,k means,k
Asynchronous communication,Computer science,Parallel algorithm,Massively parallel,Parallel computing,Fault tolerance,Gossip protocol,Cluster analysis,Wireless sensor network,Distributed computing,Scalability
Journal
Volume
Issue
ISSN
73
3
0743-7315
Citations 
PageRank 
References 
28
0.97
28
Authors
4
Name
Order
Citations
PageRank
Giuseppe Di Fatta152939.23
Francesco Blasa2411.55
Simone Cafiero3411.55
Giancarlo Fortino41756155.44