Title
Large graph processing in the cloud
Abstract
As the study of graphs, such as web and social graphs, becomes increasingly popular, the requirements of efficiency and programming flexibility of large graph processing tasks challenge existing tools. We propose to demonstrate Surfer, a large graph processing engine designed to execute in the cloud. Surfer provides two basic primitives for programmers - MapReduce and propagation. MapReduce, originally developed by Google, processes different key-value pairs in parallel, and propagation is an iterative computational pattern that transfers information along the edges from a vertex to its neighbors in the graph. These two primitives are complementary in graph processing. MapReduce is suitable for processing flat data structures, such as vertex-oriented tasks, and propagation is optimized for edge-oriented tasks on partitioned graphs. To further improve the programmability of large graph processing, Surfer consists of a small set of high level building blocks that use these two primitives. Developers may also construct custom building blocks. Surfer further provides a GUI (Graphical User Interface) using which developers can visually create large graph processing tasks. Surfer transforms a task into an execution plan composed of MapReduce and propagation operations. It then automatically applies various optimizations to improve the efficiency of distributed execution. Surfer also provides a visualization tool to monitor the detailed execution dynamics of the execution plan to show the interesting tradeoffs between MapReduce and propagation. We demonstrate our system in two ways: first, we demo the ease-of-programming features of the system; second, we show the efficiency of the system with a series of applications on a social network. We find that Surfer is simple to use and is highly efficient for large graph-based tasks.
Year
DOI
Venue
2010
10.1145/1807167.1807297
SIGMOD Conference
Keywords
Field
DocType
partitioned graph,large graph processing engine,social graph,execution plan,propagation operation,graph processing,propagation,large graph-based task,distributed systems,detailed execution dynamic,large graph processing,large graph processing task,mapreduce,data structure,distributed system,graphic user interface,social network,process engineering
Data mining,Social network,Computer science,Theoretical computer science,Graphical user interface,Small set,Distributed computing,Data structure,Graph,Vertex (geometry),Visualization,Database,Cloud computing
Conference
Citations 
PageRank 
References 
41
1.78
7
Authors
4
Name
Order
Citations
PageRank
Rishan Chen132617.81
Xuetian Weng22228.93
Bingsheng He32810179.09
Mao Yang449630.94