Clustering with Domain Value Dissimilarity for Categorical Data - Citegraph

Paper Info

Title
Clustering with Domain Value Dissimilarity for Categorical Data

Abstract
Clustering is a representative grouping process to find out hidden information and understand the characteristics of dataset to get a view of the further analysis. The concept of similarity and dissimilarity of objects is a fundamental decisive factor for clustering and the measure of them dominates the quality of results. When attributes of data are categorical, it is not simple to quantify the dissimilarity of data objects that have unimportant attributes or synonymous values. We suggest a new idea to quantify dissimilarity of objects by using distribution information of data correlated to each categorical value. Our method discovers intrinsic relationship of values and measures dissimilarity of objects effectively. Our approach does not couple with a clustering algorithm tightly and so can be applied various algorithms flexibly. Experiments on both synthetic and real datasets show propriety and effectiveness of this method. When our method is applied only to traditional clustering algorithms, the results are considerably improved than those of previous methods.

Year	DOI	Venue
2009	10.1007/978-3-642-03067-3_25	ICDM
Keywords	Field	DocType
group process,data mining,categorical data,co occurrence,clustering,similarity	Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Consensus clustering,FLAME clustering,Artificial intelligence,Cluster analysis,Single-linkage clustering,Data stream clustering,Pattern recognition,Correlation clustering,Machine learning	Conference
Citations	PageRank	References
5	0.55	10
Authors
3

Authors (3 rows)

Cited by (5 rows)

References (10 rows)

Name	Order	Citations	PageRank
Jeong-Hoon Lee	1	291	16.06
Yoonjoon Lee	2	574	175.37
Minho Park	3	68	10.17

1