Title
A Fast and Flexible Clustering Algorithm Using Binary Discretization
Abstract
We present in this paper a new clustering algorithm for multivariate data. This algorithm, called BOOL (Binary coding Oriented clustering), can detect arbitrarily shaped clusters and is noise tolerant. BOOL handles data using a two-step procedure: data points are first discretized and represented as binary words, clusters are then iteratively constructed by agglomerating smaller clusters using this representation. This latter step is carried out with linear complexity by sorting such binary representations, which results in dramatic speedups when compared with other techniques. Experiments show that BOOL is faster than K-means, and about two to three orders of magnitude faster than two state-of-the-art algorithms that can detect non-convex clusters of arbitrary shapes. We also show that BOOL's results are robust to changes in parameters, whereas most algorithms for arbitrarily shaped clusters are known to be overly sensitive to such changes. The key to the robustness of BOOL is the hierarchical structure of clusters that is introduced automatically by increasing the accuracy of the discretization.
Year
DOI
Venue
2011
10.1109/ICDM.2011.9
ICDM
Keywords
Field
DocType
dramatic speedup,binary discretization,hierarchical structure,data point,flexible clustering algorithm,state-of-the-art algorithm,oriented clustering,multivariate data,binary representation,new clustering algorithm,arbitrary shape,binary word,k means,computational complexity,sorting,hierarchical clustering,data handling,data mining,discretization,learning artificial intelligence
Discretization,Data mining,Computer science,Robustness (computer science),Artificial intelligence,Cluster analysis,Binary number,Hierarchical clustering,Pattern recognition,Binary code,Sorting,Machine learning,Computational complexity theory
Conference
Citations 
PageRank 
References 
4
0.42
5
Authors
2
Name
Order
Citations
PageRank
Mahito Sugiyama17713.27
Akihiro Yamamoto213526.84