Title
Randomization methods for assessing data analysis results on real-valued matrices
Abstract
Randomization is an important technique for assessing the significance of data analysis results. Given an input dataset, a randomization method samples at random from some class of datasets that share certain characteristics with the original data. The measure of interest on the original data is then compared to the measure on the samples to assess its significance. For certain types of data, e.g., gene expression matrices, it is useful to be able to sample datasets that have the same row and column distributions of values as the original dataset. Testing whether the results of a data mining algorithm on such randomized datasets differ from the results on the true dataset tells us whether the results on the true data were an artifact of the row and column statistics, or due to some more interesting phenomena in the data. We study the problem of generating such randomized datasets. We describe methods based on local transformations and Metropolis sampling, and show that the methods are efficient and usable in practice. We evaluate the performance of the methods both on real and generated data. We also show how our methods can be applied to a real data analysis scenario on DNA microarray data. The results indicate that the methods work efficiently and are usable in significance testing of data mining results on real-valued matrices. Copyright © 2009 Wiley Periodicals, Inc., A Wiley Company
Year
DOI
Venue
2009
10.1002/sam.v2:4
Statistical Analysis and Data Mining
Keywords
Field
DocType
dna microarray,data analysis,data mining,markov chain monte carlo
USable,Data mining,Significance testing,Markov chain Monte Carlo,Uniformization (probability theory),Matrix (mathematics),Computer science,Data type,Randomization,Artificial intelligence,Sampling (statistics),Machine learning
Journal
Volume
Issue
Citations 
2
4
14
PageRank 
References 
Authors
0.80
17
5
Name
Order
Citations
PageRank
Markus Ojala11036.03
Niko Vuokko2855.27
Aleksi Kallio3855.75
Niina Haiminen4819.71
Heikki Mannila565951495.69