Abstract | ||
---|---|---|
SimRank is an effective similarity measure between vertices in a graph, which has become a fundamental technique in graph analytics. Despite its popularity, computation of SimRank is often costly in both space and time, especially with the ever growing scale of graph data nowadays. In this paper, we focus on the computation of Single-Source SimRank: given a query vertex, return the similarities between this vertex and any other vertices in the graph. The traditional centralized SimRank algorithms are not efficient for this problem. To fully utilize the computing power of modern distributed systems, we propose sssSimRank, an efficient distributed algorithm based on the random walk model. Our algorithm achieves scalability via minimizing the total number, the space cost, and the matching time of random walks. We implement our approach on the popular distributed processing platform Spark. Experimental results demonstrate the effectiveness, efficiency and scalability of our method. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/ICPADS.2016.141 | 2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS) |
Keywords | Field | DocType |
graph analytics, big data, SimRank, random walk, distributed algorithm, Spark | Spark (mathematics),Similarity measure,Vertex (geometry),Computer science,Theoretical computer science,Distributed algorithm,Big data,SimRank,Computation,Distributed computing,Scalability | Conference |
ISSN | Citations | PageRank |
1521-9097 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xingkun Gao | 1 | 0 | 0.34 |
Nianyuan Bao | 2 | 0 | 0.34 |
Jie Liu | 3 | 17 | 3.33 |
Jie Tang | 4 | 9 | 5.86 |
Gangshan Wu | 5 | 275 | 36.63 |