Title
Efficient Algorithms for Constrained Clustering with Side Information.
Abstract
Clustering as an unsupervised machine learning method has broad applications within the area of data science and natural language processing. In this paper, we use background knowledge or side information of the data as constraints to improve clustering accuracy. Following the representation method as in [15], we first format the side information as must-link set and cannot-link set. Then we propose a constrained k-means algorithm for clustering the data. The key idea of our algorithm for clustering must-link data sets is to treat each set as a data with large volume, which is, to assign a set of must-link data as a whole to the center closest to its mass center. In contrast, the key for clustering cannot-link data set is to transform the assignment of the involved data points to the computation of a minimum weight perfect matching. At last, we carried out numerical simulation to evaluate our algorithms for constrained k-means on UCI datasets. The experimental results demonstrate that our method outperforms the previous constrained k-means as well as the classical k-means in both clustering accuracy and runtime.
Year
DOI
Venue
2019
10.1007/978-981-15-2767-8_25
PAAP
Field
DocType
Citations 
Data point,Data set,Computer science,Algorithm,Matching (graph theory),Unsupervised learning,Minimum weight,Constrained clustering,Cluster analysis,Computation
Conference
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Zhendong Hao100.34
Longkun Guo265.49
Pei Yao301.01
Peihuang Huang400.68
Huihong Peng500.34