Title
Clustering Ensembles Based on Probability Density Function Estimation
Abstract
The traditional clustering algorithms rely excessively on the similarity of the geographic distance between objects, like DBSCAN, which is unlikely to handle uncertain objects that are geometrically indistinguishable. Attempting to prevent this reliance, a model that adopts a probability density estimation function for clustering ensembles is therefore proposed. In other words, the object would be assigned to its most appropriate cluster through mining the distribution of data in the base clusterings fully considering the relationship between the objects, and finding the probability density function that fits datasets. It is worth mentioning that the approach of generating base clusterings through data sampling not only relatively simple to extract potential relationships between objects due to random combinations but also is beneficial to cluster large data sets in the real world. It is first to extract a certain proportion of samples from the dataset, and subsequently to run the k-means algorithm to generate base clusterings repeatedly. Accordingly, an object-cluster association matrix set was conducted, which has summarized the base clusterings, namely the binary association matrix. Under the inspiration of the Bayesian classifier, the object is divided into its corresponding clusters determined by the prior probabilities and posterior probabilities of clusters. Secondly, we can easily obtain the discrete probability distributions of the data objects under the condition of each object’s label from the matrix, which is posterior probabilities of clusters, so one important point left in this paper is to estimate the probability distributions of object labels. To address this, a well-known kernel density estimation method was adopted to attain the cluster probability distributions. Finally, the data objects are assigned to their most likely clusters via calculating the Bayesian formula. Extensive experimental results of the proposed method on several datasets were compared with other clustering ensembles algorithms, demonstrating the effectiveness, efficiency, and scalability.
Year
DOI
Venue
2020
10.1109/CSCloud-EdgeCom49738.2020.00029
2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom)
Keywords
DocType
ISBN
clustering ensembles,base clustering,probability density function,probability distribution,kernel density estimation
Conference
978-1-7281-6551-6
Citations 
PageRank 
References 
0
0.34
10
Authors
3
Name
Order
Citations
PageRank
Yingyan Wu100.34
Yu-Lin He2906.31
Joshua Zhexue Huang3136582.64