Title
MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data
Abstract
In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.
Year
DOI
Venue
2021
10.3233/IDA-205340
INTELLIGENT DATA ANALYSIS
Keywords
DocType
Volume
Incomplete categorical matrix data, set pair information granule, k-modes, set pair distance, set pair k-modes
Journal
25
Issue
ISSN
Citations 
6
1088-467X
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Chunying Zhang144.17
Ruiyan Gao200.34
Jiahao Wang3124.12
Song Chen400.34
Fengchun Liu501.01
Jing Ren601.35
Xiaoze Feng701.01