Title
A comparative analysis of dissimilarity measures for clustering categorical data
Abstract
Similarity and dissimilarity (distance) between objects is an important aspect that must be considered when clustering data. When clustering categorical data, for instance, these distance (similarity or dissimilarity) measures need to address properly the real particularities of categorical data. In this paper, we perform a comparative analysis with four different dissimilarity measures used as a distance metric for clustering categorical data. The first one is the Simple Matching Dissimilarity Measure (SMDM), which is one of the simplest and the most used metric for categorical attribute. The other two are context-based approaches (DIstance Learning in Categorical Attributes - DILCA and Domain Value Dissimilarity-DVD), and the last one is an extension of the SMDM, which is proposed in this paper. All four dissimilarities are applied as distance metrics in two well known clustering algorithms, k-means and agglomerative hierarchical clustering algorithms. In this analysis, we also use internal and external cluster validity measures, aiming to compare the effectiveness of all four distance measures in both clustering algorithms.
Year
DOI
Venue
2013
10.1109/IJCNN.2013.6707039
Neural Networks
Keywords
Field
DocType
data handling,learning (artificial intelligence),pattern clustering,DILCA,DVD,SMDM,categorical data clustering,comparative analysis,context based approaches,dissimilarity measurement,distance learning in categorical attributes,distance metrics,domain value dissimilarity,simple matching dissimilarity measure
Fuzzy clustering,Data mining,Computer science,Categorical variable,Consensus clustering,Artificial intelligence,Cluster analysis,Single-linkage clustering,k-medians clustering,Hierarchical clustering,Pattern recognition,Correlation clustering,Machine learning
Conference
ISSN
ISBN
Citations 
2161-4393
978-1-4673-6128-6
0
PageRank 
References 
Authors
0.34
5
5