Title
Estimation of distance-based metrics for very large graphs with MinHash Signatures
Abstract
We propose a highly efficient and effective algorithm to estimate metrics on very large graphs that are based on the neighborhood function: examples of such metrics are the (effective) diameter and (effective) radius or the average distance. In order to efficiently provide good approximations to the size of the neighborhood set of any node, we refer to the MinHash Signatures approach to derive compressed representations of large sparse datasets but preserving similarity. The technique introduced here, named MinHash Signature Estimation (MHSE), exploits the similarity between signatures to estimate the size of the neighborhood function. We show that MHSE is as effective as HyperANF, which is considered the state of art approach for the estimation of the effective diameter of a very large graph. Indeed, performing both parametric (t-test) and non-parametric (Wilcoxon) statistical tests on residuals for average distance, effective diameter and number of connected pairs, the p-values show that MHSE tends to be more statistically significant than HyperANF. On the other hand, we show that MHSE is a very simple and easily distributable algorithm. In addition, by the property of the signatures to preserve similarity between neighborhoods of nodes, the algorithm can be suitably applied to allow to search and estimate the overlapping size of the most similar neighborhood at different distances.
Year
DOI
Venue
2017
10.1109/BigData.2017.8257969
2017 IEEE International Conference on Big Data (Big Data)
Keywords
DocType
ISSN
Effective Diameter,Approximate Estimation,HyperANF,MinHash Signature,Probabilistic Counters
Conference
2639-1589
ISBN
Citations 
PageRank 
978-1-5386-2716-7
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Giambattista Amati133528.35
Simone Angelini200.34
G. Gambosi3396.30
Gianluca Rossi423521.60
Paola Vocca522524.79