Title
Normalized Compression Distance of Multiples
Abstract
Normalized compression distance (NCD) is a parameter-free similarity measure based on compression. The NCD between pairs of objects is not sufficient for al l applications. We propose an NCD of finite multisets (multiples) of objacts that is metric and is bette r for many applications. Previously, attempts to obtain such an NCD failed. We use the theoretical notion of Kolmogorov complexity that for practical purposes is approximated from above by the length of the compressed version of the file involved, using a real-world compression program. We applied the new NCD for multiples to retinal progenitor cell questions that were earlier treated with the pairwise NCD. Here we get significantly better results. We also applied the NCD for multiples to synthetic time sequence data. The preliminary results are as good as nearest neighbor Euclidean classifier. Index Terms— Normalized compression distance, multisets or multiples, pattern recognition, data mining, similarity, Kolmogorov complexity, retinal progenitor cell classification, synthetic data classification
Year
Venue
Field
2012
CoRR
k-nearest neighbors algorithm,Pairwise comparison,Similarity measure,Kolmogorov complexity,Normalized compression distance,Algorithm,Synthetic data,Euclidean geometry,Classifier (linguistics),Mathematics
DocType
Volume
Citations 
Journal
abs/1212.5711
0
PageRank 
References 
Authors
0.34
6
2
Name
Order
Citations
PageRank
Andrew R. Cohen16312.00
Paul Vitányi22130287.76