Title
What is the Dimension of Your Binary Data?
Abstract
Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.
Year
DOI
Venue
2006
10.1109/ICDM.2006.167
ICDM
Keywords
Field
DocType
binary data,effective dimensionality,dataset d,basic idea,dependency structure,large number,normalized fractal dimension,unnormalized fractal dimension,fractal dimension,nontrivial problem,data mining,data handling,principal component analysis
Fractal analysis,Fractal dimension,Pattern recognition,Curse of dimensionality,Correlation dimension,Artificial intelligence,Binary data,Sufficient dimension reduction,Mathematics,Principal component analysis,Multifractal system
Conference
ISSN
ISBN
Citations 
1550-4786
0-7695-2701-9
27
PageRank 
References 
Authors
1.79
17
4
Name
Order
Citations
PageRank
Nikolaj Tatti152734.26
Taneli Mielikäinen275939.97
Aristides Gionis36808386.81
Heikki Mannila465951495.69