Title
Parallel structural graph clustering
Abstract
We address the problem of clustering large graph databases according to scaffolds (i.e., large structural overlaps) that are shared between cluster members. In previous work, an online algorithm was proposed for this task that produces overlapping (non-disjoint) and nonexhaustive clusterings. In this paper, we parallelize this algorithm to take advantage of high-performance parallel hardware and further improve the algorithm in three ways: a refined cluster membership test based on a set abstraction of graphs, sorting graphs according to size, to avoid cluster membership tests in the first place, and the definition of a cluster representative once the cluster scaffold is unique, to avoid cluster comparisons with all cluster members. In experiments on a large database of chemical structures, we show that running times can be reduced by a large factor for one parameter setting used in previous work. For harder parameter settings, it was possible to obtain results within reasonable time for 300,000 structures, compared to 10,000 structures in previous work. This shows that structural, scaffold-based clustering of smaller libraries for virtual screening is already feasible.
Year
DOI
Venue
2011
10.1007/978-3-642-23808-6_17
ECML/PKDD (3)
Keywords
Field
DocType
large database,cluster member,large graph databases,cluster scaffold,cluster membership test,large factor,refined cluster membership test,cluster comparison,previous work,cluster representative,parallel structural graph clustering
Graph,Online algorithm,Graph database,Abstraction,Computer science,Sorting,Theoretical computer science,Cluster analysis,Virtual screening,Clustering coefficient
Conference
Volume
ISSN
Citations 
6913
0302-9743
8
PageRank 
References 
Authors
0.46
13
4
Name
Order
Citations
PageRank
Madeleine Seeland1273.38
Simon A. Berger2717.46
Alexandros Stamatakis399596.27
Stefan Kramer41313141.90