Title
Scaling graph traversal to 281 trillion edges with 40 million cores
Abstract
ABSTRACTGraph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. The successor of Sunway TaihuLight, New Sunway, is equipped with nearly 10 PB memory and over 40 million cores, which brings the opportunity to process hundreds of trillions of edges graphs. However, the graph with an unprecedented scale also brings severe performance challenges, including load imbalance, poor locality, and irregular access of graph traversal workload. To address the scalability problem, we propose a novel 3-level degree-aware 1.5D graph partitioning, which benefits from both delegated 1D and 2D partitioning. By delegating extremely heavy vertices globally and other heavy vertices on columns and rows in the processes mesh, we break the scalability wall of previous partitioning methods. Together with sub-iteration direction optimization, core group -aware core subgraph segmenting, and a new on-chip sorting mechanism using RMA, we achieve 180,792 GTEPS on a graph with 281 trillion edges, using 103,912 processors with over 40 million cores, achieving 1.75X performance and 8X capacity compared to the previous state of the art and conforming to the Graph 500 BFS benchmark[14].
Year
DOI
Venue
2022
10.1145/3503221.3508403
Principles and Practice of Parallel Programming
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Huanqi Cao100.34
Yuanwei Wang200.68
Haojie Wang301.01
Heng Lin400.34
Zixuan Ma511.03
Wanwang Yin600.34
Wenguang Chen7101470.57