Abstract | ||
---|---|---|
This paper addresses the problem of model structure determination in unsupervised learning. In contrast to previous work, we define the problem of model selection to be determining both the optimal number of clusters and the optimal feature set. We formulate this problem in a Bayesian statistical estimation framework, proposing a general expression for an objective function, the maximization of which will be taken to correspond to the optimal model structure. By making assumptions about the generative model, we derive a closed-form expression for the document clustering problem. We then provide heuristics that find the optimum (or at least a sub-optimum) of this objective function in terms of the feature sets and the number of clusters. We finally evaluate our objective function and optimization algorithm by extensive experimentation and comparison against an expert-judged, non-trivial, data-set. We find that there is good agreement between the clusters produced using our objective function/algorithm and the expert-judged classes. We also show that our approach compares favorably with others present in the literature. |
Year | Venue | Keywords |
---|---|---|
1999 | ICML | unsupervised learning,document clustering,model selection |
Field | DocType | ISBN |
Competitive learning,Pattern recognition,Document clustering,Computer science,Model selection,Unsupervised learning,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning | Conference | 1-55860-612-2 |
Citations | PageRank | References |
34 | 9.84 | 1 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shivakumar Vaithyanathan | 1 | 2518 | 234.40 |
Byron Dom | 2 | 2600 | 825.93 |