Title | ||
---|---|---|
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC |
Abstract | ||
---|---|---|
Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1145/2487575.2487581 | Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining |
Keywords | DocType | Citations |
big data | Conference | 107 |
PageRank | References | Authors |
2.77 | 17 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wook-Shin Han | 1 | 805 | 57.85 |
Sangyeon Lee | 2 | 133 | 3.82 |
Kyungyeol Park | 3 | 131 | 3.46 |
Jeong-Hoon Lee | 4 | 291 | 16.06 |
Min-Soo Kim | 5 | 140 | 4.92 |
Jin-ha Kim | 6 | 329 | 18.78 |
Hwanjo Yu | 7 | 1715 | 114.02 |