Title
Enabling scalable and accurate clustering of distributed ligand geometries on supercomputers.
Abstract
Scalable method to cluster molecules from docking simulations on distributed systems.Projections and interpolations into 3-D and 6-D capture molecular geometries.Our approach scales up to 2048 processing cores and 2 TB input data.Our approach is more accurate than energy-based and centralized clustering methods. We present an efficient and accurate clustering method for the analysis of protein-ligand docking datasets on large distributed-memory systems. For each ligand conformation in the dataset, our clustering algorithm first extracts relevant geometrical properties and transforms the properties into a single metadata point in the N-dimensional (N-D) space. Then, it performs an N-D clustering on the metadata to search for predominant clusters. Our method avoids the need to move ligand conformations among nodes, because it extracts relevant data properties locally and concurrently. By doing so, we transform the analysis problem (e.g., clustering or classification) into a search for property aggregates. Our analysis shows that when using small computer systems of up to 64 nodes, the performance is not sensitive to data content and distribution. When using larger computer systems of up to 256 nodes the scalability of simulations with strong convergence toward specific geometries is less sensitive to overheads due to the shuffling of metadata information. We also demonstrate that our method of metadata extraction captures the geometrical properties of ligand conformations more effectively and clusters and predicts near-native ligand conformations more accurately than do traditional methods, including the hierarchical clustering and energy-based scoring methods.
Year
DOI
Venue
2017
10.1016/j.parco.2017.02.005
Parallel Computing
Keywords
Field
DocType
Ligand conformations,Protein-ligand docking,Drug design,Octree-based clustering,N-dimensional clustering,In situ analytics
Hierarchical clustering,Cluster (physics),Metadata,Correlation clustering,Computer science,Protein–ligand docking,Theoretical computer science,Shuffling,Cluster analysis,Scalability
Journal
Volume
Issue
ISSN
63
C
0167-8191
Citations 
PageRank 
References 
1
0.35
23
Authors
5
Name
Order
Citations
PageRank
boyu zhang17117.54
Trilce Estrada212018.27
Pietro Cicotti310114.52
Pavan Balaji41475111.48
michela taufer535253.04