SimClus: an effective algorithm for clustering with a lower bound on similarity - Citegraph

Paper Info

Title
SimClus: an effective algorithm for clustering with a lower bound on similarity

Abstract
Clustering algorithms generally accept a parameter k from the user, which determines the number of clusters sought. However, in many application domains, like document categorization, social network clustering, and frequent pattern summarization, the proper value of k is difficult to guess. An alternative clustering formulation that does not require k is to impose a lower bound on the similarity between an object and its corresponding cluster representative. Such a formulation chooses exactly one representative for every cluster and minimizes the representative count. It has many additional benefits. For instance, it supports overlapping clusters in a natural way. Moreover, for every cluster, it selects a representative object, which can be effectively used in summarization or semi-supervised classification task. In this work, we propose an algorithm, SimClus, for clustering with lower bound on similarity. It achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic data sets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.

Year	DOI	Venue
2011	10.1007/s10115-010-0360-6	Knowl. Inf. Syst.
Keywords	Field	DocType
representative count,parameter k,social network clustering,clustering algorithm,dominating set · overlapping clustering · set cover · star clustering,alternative clustering formulation,representative object,corresponding cluster representative,previous algorithm,effective algorithm,clustering quality,fewer representative object,star clusters,dominating set,lower bound,set cover,synthetic data,social network	Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Pattern recognition,Correlation clustering,Determining the number of clusters in a data set,Algorithm,Constrained clustering,Machine learning	Journal
Volume	Issue	ISSN
28	3	0219-3116
Citations	PageRank	References
13	0.64	20
Authors
3

Authors (3 rows)

Cited by (13 rows)

References (20 rows)

Name	Order	Citations	PageRank
Mohammad Al Hasan	1	427	35.08
Saeed Salem	2	182	17.39
Mohammed Javeed Zaki	3	7972	536.24

1