Abstract | ||
---|---|---|
The problem of how to compare empirical graphs is an area of great interest within the field of network science. The ability to accurately but efficiently compare graphs has a significant impact in such areas as temporal graph evolution, anomaly detection and protein comparison. The comparison problem is compounded when working with massive graphs containing millions of vertices and edges.This paper introduces a parallel feature extraction based approach for the efficient comparison of large unlabelled graph datasets using Apache Spark. The approach acts by producing a 'Graph Fingerprint' which represents both vertex level and global level topological features from a graph. By using Spark we are able to efficiently compare graphs considered unmanageably large to other approaches. The runtime of the approach is shown to scale sub-linearly with the size and complexity of the graphs being fingerprinted. Importantly, the approach is shown to not only be comparable to existing approaches, but on when comparing topology and size, more sensitive at detecting variation between graphs. |
Year | Venue | Keywords |
---|---|---|
2016 | 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | Graph Similarity, Feature Extraction, Spark |
Field | DocType | Citations |
Network science,Data mining,Anomaly detection,Modular decomposition,Spark (mathematics),Computer science,Theoretical computer science,Artificial intelligence,Graph operations,Vertex (geometry),Null model,Clique-width,Machine learning | Conference | 2 |
PageRank | References | Authors |
0.36 | 17 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Stephen Bonner | 1 | 44 | 7.88 |
John Brennan | 2 | 18 | 4.36 |
Georgios K. Theodoropoulos | 3 | 189 | 20.78 |
Ibad Kureshi | 4 | 24 | 5.22 |
Andrew Stephen Mcgough | 5 | 105 | 11.18 |