Title
ProtDet-CCH: Protein remote homology detection by combining Long Short-Term Memory and ranking methods.
Abstract
As one of the most challenging tasks in sequence analysis, protein remote homology detection has been extensively studied. Methods based on discriminative models and ranking approaches have achieved the state-of-the-art performance, and these two kinds of methods are complementary. In this study, three LSTM models have been applied to construct the predictors for protein remote homology detection, including ULSTM, BLSTM, and CNN-BLSTM. They are able to automatically extract the local and global sequence order information. Combined with PSSMs, the CNN-BLSTM achieved the best performance among the three LSTM-based models. We named this method as CNN-BLSTM-PSSM. Finally, a new method called ProtDet-CCH was proposed by combining CNN-BLSTM-PSSM and a ranking method HHblits. Tested on a widely used SCOP benchmark dataset, ProtDet-CCH achieved an ROC score of 0.998, and an ROC50 score of 0.982, significantly outperforming other existing state-of-the-art methods. Experimental results on two updated SCOPe independent datasets showed that ProtDet-CCH can achieve stable performance. Furthermore, our method can provide useful insights for studying the features and motifs of protein families and superfamilies. It is anticipated that ProtDet-CCH will become a very useful tool for protein remote homology detection.
Year
DOI
Venue
2019
10.1109/TCBB.2018.2789880
IEEE/ACM transactions on computational biology and bioinformatics
Keywords
Field
DocType
Proteins,Computer architecture,Benchmark testing,Microprocessors,Logic gates,Databases
Protein family,Logic gate,Ranking,Computer science,Long short term memory,Homology (biology),Artificial intelligence,Discriminative model,Benchmark (computing),Machine learning,Sequence analysis
Journal
Volume
Issue
ISSN
16
4
1557-9964
Citations 
PageRank 
References 
2
0.40
0
Authors
2
Name
Order
Citations
PageRank
Bin Liu141933.30
Li Shumin2202.02