Title
An Intelligent Clustering Algorithm for High-Dimensional Multiview Data in Big Data Applications
Abstract
There are many high-dimensional multiview data in various big data applications. It is very difficult to deal with those high-dimensional multiview data for the classic clustering algorithms, which consider all features of data with equal relevance. To tackle this challenging problem, this paper aims at proposing a novel intelligent weighting k-means clustering (IWKM) algorithm based on swarm intelligence. Firstly, the degree of coupling between clusters is presented in the model of clustering to enlarge the dissimilarity of clusters. Various weights of views and features are used in the weighting distance function to determine the clusters of objects. Secondly, to eliminate the sensitivity of initial cluster centers, swarm intelligence is utilized to find initial cluster centers, weights of views, and weights of features by a global search. Lastly, a precise perturbation is proposed to improve optimization performance of swarm intelligence. To verify the performance of clustering for high-dimensional multiview data, the experiments were performed by the evaluation metrics of Rand Index, Jaccard Coefficient and Folkes Russe in five big data applications on the two different computational platforms of apache spark and single node. The experimental results show that IWKM is effective and efficient in clustering of high-dimensional multiview data, and can obtain better performance than the other 5 kinds of approaches in these complicated data sets with more views and higher dimensions on apache spark and single node.
Year
DOI
Venue
2020
10.1016/j.neucom.2018.12.093
Neurocomputing
Keywords
DocType
Volume
Big data,Clustering,High dimension multiview data,Optimization,Spark
Journal
393
ISSN
Citations 
PageRank 
0925-2312
1
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Qian Tao15914.00
Chunqin Gu210.34
Zhenyu Wang3113.86
Daoning Jiang410.34