Title
Semi-supervised fuzzy co-clustering algorithm for document categorization.
Abstract
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.
Year
DOI
Venue
2013
10.1007/s10115-011-0454-9
Knowl. Inf. Syst.
Keywords
Field
DocType
Semi-supervised clustering, Fuzzy co-clustering, Must-link/ Cannot-link constraint, Document categorization
Fuzzy clustering,Data mining,Fuzzy classification,Fuzzy set operations,Computer science,FLAME clustering,Artificial intelligence,Fuzzy number,Cluster analysis,Correlation clustering,Algorithm,Constrained clustering,Machine learning
Journal
Volume
Issue
ISSN
34
1
0219-3116
Citations 
PageRank 
References 
8
0.47
41
Authors
3
Name
Order
Citations
PageRank
Yang Yan180.47
Lihui Chen238027.30
William-Chandra Tjhi315610.09