Title
A hybrid unsupervised approach for document clustering
Abstract
We propose a hybrid, unsupervised document clustering approach that combines a hierarchical clustering algorithm with Expectation Maximization. We developed several heuristics to automatically select a subset of the clusters generated by the first algorithm as the initial points of the second one. Furthermore, our initialization algorithm generates not only an initial model for the iterative refinement algorithm but also an estimate of the model dimension, thus eliminating another important element of human supervision. We have evaluated the proposed system on five real-world document collections. The results show that our approach generates clustering solutions of higher quality than both its individual components.
Year
DOI
Venue
2005
10.1145/1081870.1081957
KDD
Keywords
Field
DocType
document clustering,initial point,real-world document collection,initialization algorithm,higher quality,hierarchical clustering algorithm,hybrid unsupervised approach,iterative refinement algorithm,model dimension,expectation maximization,unsupervised document,initial model,hierarchical clustering
Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Pattern recognition,Constrained clustering,Machine learning
Conference
ISBN
Citations 
PageRank 
1-59593-135-X
14
0.81
References 
Authors
5
3
Name
Order
Citations
PageRank
Mihai Surdeanu12582174.69
Jordi Turmo230630.52
Alicia Ageno315015.77