Abstract | ||
---|---|---|
Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and message data on disk to improve scalability. Currently, these distributed systems with disk-resident data take a push-based approach to handle messages. This works well if few messages reside on disk. Otherwise, it is I/O-inefficient due to expensive random writes. By contrast, the existing memory-resident pull-based approach individually pulls messages for each vertex on demand. Although it can be used to avoid disk operations regarding messages, expensive I/O costs are incurred by random and frequent access to vertices. This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. We first employ a new block-centric technique (b-pull) to improve the I/O-performance of pulling messages, although the iterative computation is vertex-centric. I/O costs of data accesses are shifted from the receiver side where messages are written/read by push to the sender side where graph data are read by b-pull. Graph data are organized by clustering vertices and edges to achieve high I/O-efficiency in b-pull. Second, we design a seamless switching mechanism and a prominent performance prediction method to guarantee efficiency when switching between push and b-pull. We conduct extensive performance studies to confirm the effectiveness of our proposals over existing up-to-date solutions using a broad spectrum of real-world graphs. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1145/2882903.2882938 | SIGMOD Conference |
Keywords | Field | DocType |
I/O-Efficient,Distributed Graph Computing,Push,Pull | Social network,Vertex (geometry),Computer science,Communication source,Input/output,Cluster analysis,Performance prediction,Database,Computation,Scalability,Distributed computing | Conference |
Citations | PageRank | References |
11 | 0.53 | 24 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhigang Wang | 1 | 11 | 1.88 |
Yu Gu | 2 | 201 | 34.98 |
Yubin Bao | 3 | 11 | 0.53 |
Ge YU | 4 | 1313 | 175.88 |
Jeffrey Xu Yu | 5 | 7018 | 464.96 |