Title
Out-of-Sample Error Estimation: The Blessing of High Dimensionality
Abstract
Dealing with high dimensionality when learning from data is a tough task since, for example, similarity and correlation in data cannot be properly captured by the conventional notions of distance. Issues are amplified whenever coping with small sample problems, i.e. When the cardinality of the dataset is remarkably smaller than its dimensionality: in these cases, a reliable estimation of the accuracy of the trained model on new data is difficult to derive because of the inefficiency of standard statistical inference approaches in this framework. In this paper, we show that high dimensionality of data, at least under some assumptions, helps improving the assessment of the performance of a model, trained with empirical data in supervised classification tasks. In particular, we propose to create copies of the original dataset, where, however, only subsets of independent and informative features are considered in turn: we show that training and combining a collection of classifiers on these sets help filling the gap between the true and the estimated error of the models. In order to verify the potentiality of the proposed approach and to get more insights on it, we test the method on both an artificial problem and on a series of real-world high dimensional Human Gene Expression datasets.
Year
DOI
Venue
2014
10.1109/ICDMW.2014.41
Data Mining Workshop
Keywords
DocType
Citations 
biology computing,estimation theory,genetics,pattern classification,statistical analysis,cardinality,data correlation,data similarity,estimated error,high dimensionality,out-of-sample error estimation,real-world high dimensional human gene expression dataset,standard statistical inference approach,supervised classification task,Classification,Error Estimation,High Dimensional Problems,Performance Estimation,Small Sample,Supervised Learning
Conference
1
PageRank 
References 
Authors
0.34
23
6
Name
Order
Citations
PageRank
Luca Oneto183063.22
Alessandro Ghio266735.71
Sandro Ridella3677140.62
Jorge Luis Reyes-Ortiz432911.66
Davide Anguita5100170.58
Reyes Ortiz, J.L.610.34