Title
GraphX: a resilient distributed graph system on Spark
Abstract
From social networks to targeted advertising, big graphs capture the structure in data and are central to recent advances in machine learning and data mining. Unfortunately, directly applying existing data-parallel tools to graph computation tasks can be cumbersome and inefficient. The need for intuitive, scalable tools for graph computation has lead to the development of new graph-parallel systems (e.g., Pregel, PowerGraph) which are designed to efficiently execute graph algorithms. Unfortunately, these new graph-parallel systems do not address the challenges of graph construction and transformation which are often just as problematic as the subsequent computation. Furthermore, existing graph-parallel systems provide limited fault-tolerance and support for interactive data mining. We introduce GraphX, which combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. We leverage new ideas in distributed graph representation to efficiently distribute graphs as tabular data-structures. Similarly, we leverage advances in data-flow systems to exploit in-memory computation and fault-tolerance. We provide powerful new operations to simplify graph construction and transformation. Using these primitives we implement the PowerGraph and Pregel abstractions in less than 20 lines of code. Finally, by exploiting the Scala foundation of Spark, we enable users to interactively load, transform, and compute on massive graphs.
Year
DOI
Venue
2013
10.1145/2484425.2484427
GRADES
Keywords
Field
DocType
massive graph,graph algorithm,graph computation task,big graph,graph computation,in-memory computation,graph representation,new graph-parallel system,graph system,graph-parallel system,graph construction
Graph database,Scala,Spark (mathematics),Computer science,Theoretical computer science,Wait-for graph,Graph (abstract data type),Source lines of code,Scalability,Computation,Distributed computing
Conference
Citations 
PageRank 
References 
237
6.01
9
Authors
4
Search Limit
100237
Name
Order
Citations
PageRank
Reynold Xin1217181.33
Joseph E. Gonzalez22219102.68
Michael J. Franklin3174231681.10
I. Stoica4214061710.11