Title
Model Selection in Unsupervised Learning with Applications To Document Clustering
Abstract
This paper addresses the problem of model structure determination in unsupervised learning. In contrast to previous work, we define the problem of model selection to be determining both the optimal number of clusters and the optimal feature set. We formulate this problem in a Bayesian statistical estimation framework, proposing a general expression for an objective function, the maximization of which will be taken to correspond to the optimal model structure. By making assumptions about the generative model, we derive a closed-form expression for the document clustering problem. We then provide heuristics that find the optimum (or at least a sub-optimum) of this objective function in terms of the feature sets and the number of clusters. We finally evaluate our objective function and optimization algorithm by extensive experimentation and comparison against an expert-judged, non-trivial, data-set. We find that there is good agreement between the clusters produced using our objective function/algorithm and the expert-judged classes. We also show that our approach compares favorably with others present in the literature.
Year
Venue
Keywords
1999
ICML
unsupervised learning,document clustering,model selection
Field
DocType
ISBN
Competitive learning,Pattern recognition,Document clustering,Computer science,Model selection,Unsupervised learning,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning
Conference
1-55860-612-2
Citations 
PageRank 
References 
34
9.84
1
Authors
2
Name
Order
Citations
PageRank
Shivakumar Vaithyanathan12518234.40
Byron Dom22600825.93