Title
SPHot: Prediction of Hot Spots in Protein-RNA Complexes by Protein Sequence Information and Ensemble Classifier
Abstract
RNA-binding hot spots are a small and complementary set of interfacial residues that contribute most to the binding energy of protein-RNA interfaces. As experimental methods for identifying hot spots are time-consuming, labor-intensive and costly, there is a great interest in computational approaches that can predict hot spots on a large scale. In this paper, we introduced a sequence-based method that used ensemble classifier to predict hot spots in protein-RNA complexes. We first employed three different sequence encoding schemes based on the physicochemical properties from the AAindex database, the amino acid substitution matrix (BLOSUM62), and the predicted relative accessible surface area. Based on these sequence features, 249 individual predictors were developed to identify hot spots using the radial basis function (RBF)-based support vector machine (SVM), sigmoid-based SVM, and k-nearest neighbor algorithm (k-NN), respectively. The combinations of these individual predictors by majority voting were explored in a comprehensive way and an ensemble vote classifier composed of 43 individual predictors were selected to construct the final ensemble classifier. The ensemble classifier outperformed the state-of-the-art computational methods, yielding an F1 score of 0.843 and AUC of 0.893 on the training set as well as F1 score of 0.814 and AUC of 0.842 on the test set. The data and source code are available on the web site http://bioinfo.ahu.edu.cn:8080/SPHot.
Year
DOI
Venue
2019
10.1109/ACCESS.2019.2931552
IEEE ACCESS
Keywords
DocType
Volume
Protein-RNA complexes,hot spot,ensemble approach,protein sequence
Journal
7
ISSN
Citations 
PageRank 
2169-3536
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Sijia Zhang122.05
Le Zhao213.05
Junfeng Xia314420.14