Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. - Citegraph

Paper Info

Title
Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing.

Abstract
Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and message data on disk to improve scalability. Currently, these distributed systems with disk-resident data take a push-based approach to handle messages. This works well if few messages reside on disk. Otherwise, it is I/O-inefficient due to expensive random writes. By contrast, the existing memory-resident pull-based approach individually pulls messages for each vertex on demand. Although it can be used to avoid disk operations regarding messages, expensive I/O costs are incurred by random and frequent access to vertices. This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. We first employ a new block-centric technique (b-pull) to improve the I/O-performance of pulling messages, although the iterative computation is vertex-centric. I/O costs of data accesses are shifted from the receiver side where messages are written/read by push to the sender side where graph data are read by b-pull. Graph data are organized by clustering vertices and edges to achieve high I/O-efficiency in b-pull. Second, we design a seamless switching mechanism and a prominent performance prediction method to guarantee efficiency when switching between push and b-pull. We conduct extensive performance studies to confirm the effectiveness of our proposals over existing up-to-date solutions using a broad spectrum of real-world graphs.

Year	DOI	Venue
2016	10.1145/2882903.2882938	SIGMOD Conference
Keywords	Field	DocType
I/O-Efficient,Distributed Graph Computing,Push,Pull	Social network,Vertex (geometry),Computer science,Communication source,Input/output,Cluster analysis,Performance prediction,Database,Computation,Scalability,Distributed computing	Conference
Citations	PageRank	References
11	0.53	24
Authors
5

Authors (5 rows)

Cited by (11 rows)

References (24 rows)

Name	Order	Citations	PageRank
Zhigang Wang	1	11	1.88
Yu Gu	2	201	34.98
Yubin Bao	3	11	0.53
Ge YU	4	1313	175.88
Jeffrey Xu Yu	5	7018	464.96

1