Abstract | ||
---|---|---|
Bioinformatics offers an interesting challenge for data mining algorithms given the high dimensionality of its data and the comparatively small set of samples. Case-based classification algorithms have been successfully applied to classify bioinformatics data and often serve as a reference for other algorithms. Therefore this paper proposes to study, on some of the most benchmarked datasets in bioinformatics, the performance of different reuse strategies in case-based classification in order to make methodological recommendations for applying these algorithms to this domain. In conclusion, k-nearest-neighbor (kNN) classifiers coupled with between-group to within-group sum of squares (BSS/WSS) feature selection can perform as well and even better than the best benchmarked algorithms to date. However the reuse strategy chosen played a major role to optimize the algorithms. In particular, the optimization of both the number k of neighbors and the number of features accounted was key to improving classification accuracy. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-23291-6_29 | ICCBR |
Keywords | Field | DocType |
data mining,benchmarked datasets,case-based classification algorithm,benchmarked algorithm,classification accuracy,case-based classification,number k,different reuse strategy,reuse strategy,bioinformatics data,k nearest neighbor,classification,bioinformatics,feature selection,reuse | Data mining,Feature selection,Computer science,Artificial intelligence,Data mining algorithm,Small set,k-nearest neighbors algorithm,Reuse,Curse of dimensionality,Bioinformatics,Statistical classification,Explained sum of squares,Machine learning | Conference |
Volume | ISSN | Citations |
6880 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 13 | 1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Isabelle Bichindaritz | 1 | 532 | 55.74 |