Abstract | ||
---|---|---|
This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics, but the majority of existing algorithms for massive graphs are sequential, while distributed versions of algorithms do not guarantee their CPU, I/O, Memory, or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting sub graphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. PDTL suits a variety of computational environments, from single-core machines to high-end clusters, and computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems Power Graph, OPT, and PATRIC by 2x to 4x. Our approach thus highlights the importance of I/O in a distributed environment. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/ICPP.2015.46 | ICPP |
Keywords | Field | DocType |
Triangle Listing, Triangle Counting, Big Data, Massive Graphs, I/O-Efficient Algorithm, Distributed Algorithm, Parallel Algorithm | Graph,Indifference graph,Vertex (geometry),Distributed Computing Environment,Computer science,Parallel computing,Theoretical computer science,Triangulation (social science),Pathwidth,Maximal independent set,Distributed computing | Conference |
ISSN | Citations | PageRank |
0190-3918 | 9 | 0.52 |
References | Authors | |
17 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ilias Giechaskiel | 1 | 33 | 6.61 |
George Panagopoulos | 2 | 14 | 5.08 |
Eiko Yoneki | 3 | 1605 | 94.23 |