Title
Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set.
Abstract
Identifying protein-protein interactions (PPIs) is essential for elucidating protein functions and understanding the molecular mechanisms inside the cell. However, the experimental methods for detecting PPIs are both time-consuming and expensive. Therefore, computational prediction of protein interactions are becoming increasingly popular, which can provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale, and can be used to complement experimental approaches. Although much progress has already been achieved in this direction, the problem is still far from being solved and new approaches are still required to overcome the limitations of the current prediction models.In this work, a sequence-based approach is developed by combining a novel Multi-scale Continuous and Discontinuous (MCD) feature representation and Support Vector Machine (SVM). The MCD representation gives adequate consideration to the interactions between sequentially distant but spatially close amino acid residues, thus it can sufficiently capture multiple overlapping continuous and discontinuous binding patterns within a protein sequence. An effective feature selection method mRMR was employed to construct an optimized and more discriminative feature set by excluding redundant features. Finally, a prediction model is trained and tested based on SVM algorithm to predict the interaction probability of protein pairs.When performed on the yeast PPIs data set, the proposed approach achieved 91.36% prediction accuracy with 91.94% precision at the sensitivity of 90.67%. Extensive experiments are conducted to compare our method with the existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors, whose average prediction accuracy is 84.91%, sensitivity is 83.24%, and precision is 86.12%. Achieved results show that the proposed approach is very promising for predicting PPI, so it can be a useful supplementary tool for future proteomics studies. The source code and the datasets are freely available at http://csse.szu.edu.cn/staff/youzh/MCDPPI.zip for academic use.
Year
DOI
Venue
2014
10.1186/1471-2105-15-S15-S9
BMC Bioinformatics
Keywords
Field
DocType
SACCHAROMYCES-CEREVISIAE,CLASSIFICATION,COMPLEXES,REPRESENTATION,HYPERPLANES,INFORMATION,SITES
Protein–protein interaction,Matthews correlation coefficient,Biology,Amino acid,Support vector machine,Proteome,Feature set,Saccharomyces cerevisiae Proteins,Bioinformatics,Genetics,DNA microarray
Journal
Volume
Issue
ISSN
15 Suppl 15
S-15
1471-2105
Citations 
PageRank 
References 
22
0.86
16
Authors
6
Name
Order
Citations
PageRank
Zhuhong You174855.20
Lin Zhu2744.93
Chun-hou Zheng373271.79
Hong-Jie Yu4220.86
Su-Ping Deng5220.86
Zhen Ji61636.85