Title
Gfp-X: A Parallel Approach To Massive Graph Comparison Using Spark
Abstract
The problem of how to compare empirical graphs is an area of great interest within the field of network science. The ability to accurately but efficiently compare graphs has a significant impact in such areas as temporal graph evolution, anomaly detection and protein comparison. The comparison problem is compounded when working with massive graphs containing millions of vertices and edges.This paper introduces a parallel feature extraction based approach for the efficient comparison of large unlabelled graph datasets using Apache Spark. The approach acts by producing a 'Graph Fingerprint' which represents both vertex level and global level topological features from a graph. By using Spark we are able to efficiently compare graphs considered unmanageably large to other approaches. The runtime of the approach is shown to scale sub-linearly with the size and complexity of the graphs being fingerprinted. Importantly, the approach is shown to not only be comparable to existing approaches, but on when comparing topology and size, more sensitive at detecting variation between graphs.
Year
Venue
Keywords
2016
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Graph Similarity, Feature Extraction, Spark
Field
DocType
Citations 
Network science,Data mining,Anomaly detection,Modular decomposition,Spark (mathematics),Computer science,Theoretical computer science,Artificial intelligence,Graph operations,Vertex (geometry),Null model,Clique-width,Machine learning
Conference
2
PageRank 
References 
Authors
0.36
17
5
Name
Order
Citations
PageRank
Stephen Bonner1447.88
John Brennan2184.36
Georgios K. Theodoropoulos318920.78
Ibad Kureshi4245.22
Andrew Stephen Mcgough510511.18