Abstract | ||
---|---|---|
Supervised classifiers, such as artificial neural network, partition trees, and support vector machines. are often used for the prediction and analysis of biological data. However, choosing an appropriate classifier is not straight forward because each classifier has its own strengths and weaknesses, and each biological dataset has its own characteristics. By integrating many classifiers together, people call avoid the dilemma of choosing all individual classifier out of many to achieve in optimized classification results (Rahman et al., Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variation, Springer, Berlin. 2002, 167-178). The classification algorithms come front Weka (Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005) (a collection of software tools for machine learning algorithms). By integrating many predictors (classifiers) together through simple voting. the correct prediction (classification) rates are 65.21% and 65.63% for a basic training dataset and an independent test set, respectively. These results are better than any single machine learning algorithm collected in Weka when exactly the same data are used. Furthermore, we introduce an integration strategy which takes care of both classifier weightings and classifier redundancy. A feature selection strategy, called minimum redundancy maximum relevance (mRMR), is transferred into algorithm selection to deal with classifier redundancy in this research, and the weightings are based on the performance of each classifier. The best classification results are obtained when 11 algorithms are selected by mRMR method, and integrated together through majority votes with weightings. As a result, the prediction correct rates are 68.56% and 69.29% for the basic training dataset and the independent test dataset, respectively. The web-server is available at http:chemdata.shu.edu.cn/protein_st/. (C) 2009 Wiley Periodicals, Inc. J Comput Chem 30: 2248-2254. 2009 |
Year | DOI | Venue |
---|---|---|
2009 | 10.1002/jcc.21230 | JOURNAL OF COMPUTATIONAL CHEMISTRY |
Keywords | Field | DocType |
protein structural class,minimum redundancy maximum relevance,amino acid compositions,multiple classifier integration,Weka | Margin (machine learning),Pattern recognition,Computer science,Support vector machine,Artificial intelligence,Classifier (linguistics),Linear classifier,Statistical classification,Margin classifier,Machine learning,Quadratic classifier,Test set | Journal |
Volume | Issue | ISSN |
30.0 | 14 | 0192-8651 |
Citations | PageRank | References |
6 | 0.48 | 7 |
Authors | ||
11 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lei Chen | 1 | 6 | 0.48 |
Lin Lu | 2 | 14 | 2.11 |
Kairui Feng | 3 | 7 | 0.84 |
Wenjin Li | 4 | 6 | 0.48 |
Jie Song | 5 | 6 | 0.48 |
Lulu Zheng | 6 | 8 | 1.18 |
Youlang Yuan | 7 | 6 | 0.48 |
Zhenbing Zeng | 8 | 150 | 20.48 |
Kaiyan Feng | 9 | 34 | 4.60 |
Wen-Cong Lu | 10 | 65 | 4.40 |
Yu-Dong Cai | 11 | 340 | 34.45 |