Title
Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing.
Abstract
Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and message data on disk to improve scalability. Currently, these distributed systems with disk-resident data take a push-based approach to handle messages. This works well if few messages reside on disk. Otherwise, it is I/O-inefficient due to expensive random writes. By contrast, the existing memory-resident pull-based approach individually pulls messages for each vertex on demand. Although it can be used to avoid disk operations regarding messages, expensive I/O costs are incurred by random and frequent access to vertices. This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. We first employ a new block-centric technique (b-pull) to improve the I/O-performance of pulling messages, although the iterative computation is vertex-centric. I/O costs of data accesses are shifted from the receiver side where messages are written/read by push to the sender side where graph data are read by b-pull. Graph data are organized by clustering vertices and edges to achieve high I/O-efficiency in b-pull. Second, we design a seamless switching mechanism and a prominent performance prediction method to guarantee efficiency when switching between push and b-pull. We conduct extensive performance studies to confirm the effectiveness of our proposals over existing up-to-date solutions using a broad spectrum of real-world graphs.
Year
DOI
Venue
2016
10.1145/2882903.2882938
SIGMOD Conference
Keywords
Field
DocType
I/O-Efficient,Distributed Graph Computing,Push,Pull
Social network,Vertex (geometry),Computer science,Communication source,Input/output,Cluster analysis,Performance prediction,Database,Computation,Scalability,Distributed computing
Conference
Citations 
PageRank 
References 
11
0.53
24
Authors
5
Name
Order
Citations
PageRank
Zhigang Wang1111.88
Yu Gu220134.98
Yubin Bao3110.53
Ge YU41313175.88
Jeffrey Xu Yu57018464.96