Title
Protein Structure Modeling in a Grid Computing Environment.
Abstract
Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein struct- re models. This workflow will help increase the speed of future drug development pipelines.
Year
DOI
Venue
2013
10.1109/eScience.2013.15
eScience
Keywords
Field
DocType
protein family,actual protein structure,protein dsp21,protein structure modeling,grid computing environment,final protein model optimization,high-resolution tertiary protein structure,high quality protein structure,protein model,protein blast algorithm,protein pair,processing protein structure model,protein structure,grid computing,proteins,modeling,parallel processing
Data mining,Protein family,Grid computing,Protein sequencing,Computer science,Protein superfamily,MODELLER,Virtual screening,Homology modeling,Protein structure
Conference
ISSN
Citations 
PageRank 
2325-372X
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Daniel Li100.34
Brian Tsui200.68
Charles Xue300.34
Jason H. Haga4172.94
Kohei Ichikawa56919.79
Susumu Date613328.14