Title
Using permutation tests to study how the dimensionality, the number of classes, and the number of samples affect classification analysis
Abstract
Permutation tests have extensively been used to estimate the significance of classification. Permutation tests usually use the test error as a dataset statistic to measure the difference between two or more populations. Then, to estimate the p-value(s), the test error is compared to a set of permuted test-error(s), which is usually obtained after permuting the labels of the populations. In this study, we investigate how several dataset factors, e.g., the number of samples, the number of classes, and the dimensionality size, may affect the p-value obtained via permutation tests. We performed the analysis using the standard permutation test procedure that uses the overall all test error dataset statistic and compared it to the permutation test procedure that uses per-class test error as a dataset statistic that we recently have proposed (doi:10.1016 /j.neucom.2011.11.007). We found that permutation tests that use a per-class test error as a dataset statistic are not only more reliable in addressing the null hypothesis but also are highly sensitive to changes in the dataset factors that we investigated in this work. An important finding of this study is that when the dimensionality is low and the number of classes is up to several, say ten, highly above chance accuracy would be required to state the significance. For the same low dimensionality, however, slightly above chance accuracy would be adequate to state significance in a two-class problem.
Year
DOI
Venue
2012
10.1007/978-3-642-31295-3_5
ICIAR
Keywords
Field
DocType
dimensionality size,permutation test procedure,standard permutation test procedure,chance accuracy,per-class test error,test error dataset statistic,dataset factor,test error,permutation test,dataset statistic,classification analysis
Statistic,Pattern recognition,Null hypothesis,Permutation,Exact test,Curse of dimensionality,Multivariate normal distribution,Artificial intelligence,Statistics,Resampling,Mathematics
Conference
Volume
ISSN
Citations 
7324
0302-9743
1
PageRank 
References 
Authors
0.38
4
2
Name
Order
Citations
PageRank
Mohammed Sadeq Al-Rawi111.06
Silva Cunha, J.P.25918.44