Title
A MapReduce-Based Parallel Clustering Algorithm for Large Protein-Protein Interaction Networks.
Abstract
Clustering proteins or identifying functionally related proteins in Protein-Protein Interaction (PPI) networks is one of the most computation-intensive problems in the proteomic community. Most researches focused on improving the accuracy of the clustering algorithms. However, the high computation cost of these clustering algorithms, such as Girvan and Newmans clustering algorithm, has been an obstacle to their use on large-scale PPI networks. In this paper, we propose an algorithm, called Clustering-MR, to address the problem. Our solution can effectively parallelize the Girvan and Newmans clustering algorithms based on edge-betweeness using MapReduce. We evaluated the performance of our Clustering-MR algorithm in a cloud environment with different sizes of testing datasets and different numbers of worker nodes. The experimental results show that our Clustering-MR algorithm can achieve high performance for large-scale PPI networks with more than 1000 proteins or 5000 interactions. © Springer-Verlag 2012.
Year
DOI
Venue
2012
10.1007/978-3-642-35527-1_12
ADMA
Keywords
Field
DocType
clustering,edge-betweenness,mapreduce,ppi
Protein protein interaction network,Data mining,Obstacle,Computer science,Theoretical computer science,Artificial intelligence,Cluster analysis,Machine learning,Computation,Cloud computing
Conference
Volume
Issue
ISSN
7713 LNAI
null
16113349
Citations 
PageRank 
References 
0
0.34
7
Authors
9
Name
Order
Citations
PageRank
Li Liu163447.50
Dangping Fan2172.04
Ming Liu313513.12
Guandong Xu464075.03
Shiping Chen515210.08
Yuan Zhou61913.47
Xiwei Chen772.20
Qianru Wang8172.23
Yufeng Wei901.01