Title
Decision Tree Based Information Integration for Automated Protein Classification
Abstract
We propose a novel technique for automatically generating the SCOP classification of a protein struc- ture with high accuracy. We achieve accurate classification by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique, based on de- cision trees, is rooted in machine learning which shows that by judicially employing component clas- sifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.
Year
DOI
Venue
2005
10.1142/S0219720005001259
J. Bioinformatics and Computational Biology
Field
DocType
Volume
Decision tree,Information integration,SUPERFAMILY,Pattern recognition,Computer science,Artificial intelligence,Hierarchy,Linear classifier,Classifier (linguistics),Machine learning
Journal
3
Issue
Citations 
PageRank 
3
5
0.49
References 
Authors
4
4
Name
Order
Citations
PageRank
Orhan Çamoglu1302.59
Tolga Can226816.39
Ambuj K. Singh32442409.85
Yuan-Fang Wang4835137.72