Title
FAST: A Scalable Subgraph Matching Framework over Large Graphs
Abstract
As one of the most fundamental operations in graph analysis, subgraph matching is widely used in various fields such as social network analysis, knowledge graph query, and fraud detection. Due to its NP-complete complexity, sub-graph matching is challenging on large graphs. Previous work is limited on either scalability or the types of queries that can be handled. To address these problems, we propose a fast, scalable subgraph matching framework that consists of filtering, ordering, and enumeration stages. We exploit the parallelism in the filtering stage, and design a learning-based filtering method to remove false matching candidates; propose heuristic constraint and ordering generation methods to improve the matching efficiency; devise a distributed enumeration algorithm that is further optimized with the introduction of graph cache. Our learning- based filtering method delivers over 90% accuracy for basic queries. Compared with Prune.luice, our matching framework achieves 2–8 x speedup in triangle enumeration and up to 3–4 orders of magnitude higher throughput on generic query enumeration. The caching mechanism further boosts the performance by about 1.5 x to 2.5 x on average. Experiments also demonstrate the scalability of our framework. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> This work is supported by the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL). The grant number is WDZC2020SS00101., <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> The source code is available at https://github.com/yixinchen200S/FAST.
Year
DOI
Venue
2022
10.1109/HPEC55821.2022.9926298
2022 IEEE High Performance Extreme Computing Conference (HPEC)
Keywords
DocType
ISSN
scalable subgraph matching framework,fundamental operations,graph analysis,social network analysis,knowledge graph query,NP-complete complexity,sub-graph matching,fast subgraph matching framework,enumeration stages,filtering stage,learning-based filtering,false matching candidates,heuristic constraint,ordering generation methods,matching efficiency,distributed enumeration algorithm,graph cache,filtering method,basic queries,triangle enumeration,generic query enumeration
Conference
2377-6943
ISBN
Citations 
PageRank 
978-1-6654-9787-9
0
0.34
References 
Authors
14
6
Name
Order
Citations
PageRank
Jiezhong He100.34
Zhouyang Liu200.34
Yixin Chen34326299.19
Hengyue Pan400.34
Zhen Huang55720.78
Dongsheng Li629960.22