Title
Spectral Clustering of Large-scale Data by Directly Solving Normalized Cut.
Abstract
During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.
Year
DOI
Venue
2018
10.1145/3219819.3220039
KDD
Keywords
Field
DocType
Clustering,normalized cut,large-scale data
Cluster (physics),Data mining,Spectral clustering,Data set,Normalization (statistics),Computer science,Algorithm,Synthetic data,Partition (number theory),Cluster analysis,Time complexity
Conference
ISBN
Citations 
PageRank 
978-1-4503-5552-0
12
0.50
References 
Authors
18
6
Name
Order
Citations
PageRank
Xiaojun Chen11298107.51
Weijun Hong2150.88
Feiping Nie37061309.42
Dan He4140.88
Min Yang515541.56
Joshua Zhexue Huang6136582.64