Title
Active Density-Based Clustering
Abstract
The density-based clustering algorithm DBSCAN is a fundamental technique for data clustering with many attractive properties and applications. However, DBSCAN requires specifying all pairwise (dis) similarities among objects that can be non-trivial to obtain in many applications. To tackle this problem, in this paper, we propose a novel active density-based clustering algorithm, named Act-DBSCAN, which works under a restricted number of used pairwise similarities. Act-DBSCAN exploits the pairwise lower-bounding (LB) similarities to initialize the cluster structure. Then, it adaptively selects the most informative pairwise LB similarities to update with the real ones in order to reconstruct the result until the budget limitation is reached. The goal is to approximate as much as possible the true clustering result with each update. Our Act-DBSCAN framework is built upon a proposed probabilistic model to score the impact of the update of each pairwise LB similarity on the change of the intermediate clustering structure. Deriving from this scoring system and the monotonicity and reduction property of our active clustering process, we propose the two efficient algorithms to iteratively select and update pairwise similarities and cluster structure. Experiments on real datasets show that Act-DBSCAN acquires good clustering results with only a few pairwise similarities, and requires only a small fraction of all pairwise similarities to reach the DBSCAN results. Act-DBSCAN also outperforms other related techniques such as active spectral clustering.
Year
DOI
Venue
2013
10.1109/ICDM.2013.39
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
Keywords
Field
DocType
Active clustering, Density-based clustering, Active learning
Data mining,Fuzzy clustering,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Clustering high-dimensional data,Correlation clustering,Pattern recognition,SUBCLU,Constrained clustering,DBSCAN,Machine learning
Conference
ISSN
Citations 
PageRank 
1550-4786
3
0.37
References 
Authors
0
5
Name
Order
Citations
PageRank
Son T. Mai1342.25
Xiao He2644.65
Nina Hubig3134.64
Claudia Plant453654.69
Christian Böhm52494528.46