Coded Computing for Distributed Graph Analytics - Citegraph

Paper Info

Title
Coded Computing for Distributed Graph Analytics

Abstract
Many distributed computing systems have been developed recently for implementing graph based algorithms such as PageRank over large-scale graph-structured datasets such as social networks. Performance of these systems significantly suffers from <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">communication bottleneck</italic> as a large number of messages are exchanged among servers at each step of the computation. Motivated by graph based MapReduce, we propose a coded computing framework that leverages computation redundancy to alleviate the communication bottleneck in distributed graph processing. As a key contribution of this work, we develop a novel <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">coding</italic> scheme that systematically injects structured redundancy in the computation phase to enable <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">coded</italic> multicasting opportunities during message exchange between servers, reducing the communication load substantially in large-scale graph processing. For theoretical analysis, we consider random graph models, and focus on schemes in which subgraph allocation and Reduce allocation are only dependent on vertex ID while the Shuffle design varies with graph connectivity. Specifically, we prove that our proposed scheme enables an (asymptotically) inverse-linear trade-off between <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">computation load</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">average communication load</italic> for two popular random graph models – Erdös-Rényi model, and power law model. Particularly, for a given computation load <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula> , (i.e. when each graph vertex is carefully stored at <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula> servers), the proposed scheme slashes the average communication load by (nearly) a multiplicative factor of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula> . Furthermore, for the Erdös-Rényi model, we prove that our proposed scheme is optimal asymptotically as the graph size increases by providing an information-theoretic converse. To illustrate the benefits of our scheme in practice, we implement PageRank over Amazon EC2, using artificial as well as real-world datasets, demonstrating gains of up to 50.8% in comparison to the conventional PageRank implementation. Additionally, we specialize our coded scheme and extend our theoretical results to two other random graph models – random bi-partite model, and stochastic block model. Our specialized schemes asymptotically enable inverse-linear trade-offs between computation and communication loads in distributed graph processing for these popular random graph models as well. We complement the achievability results with converse bounds for both of these models.

Year	DOI	Venue
2020	10.1109/TIT.2020.2999675	IEEE Transactions on Information Theory
Keywords	DocType	Volume
Servers,Load modeling,Computational modeling,Resource management,Adaptation models,Redundancy,Encoding	Journal	66
Issue	ISSN	Citations
10	0018-9448	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Saurav Prakash	1	48	3.28
Amirhossein Reisizadehmobarakeh	2	61	6.44
Ramtin Pedarsani	3	171	29.35
Amir Salman Avestimehr	4	1880	157.39

1