Title
Model-Based Hierarchical Clustering
Abstract
We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the optimal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a common distribution at each node. We present experimental results on both synthetic data and a real document collection.
Year
Venue
Keywords
2013
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
marginal likelihood,general model,cluster subsets,model-based hierarchical clustering,merged cluster,flat clustering,unique distribution,optimal model structure,common distribution,cluster hierarchy,hierarchical clustering,probability distribution
Field
DocType
Volume
Data mining,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Hierarchical clustering,Complete-linkage clustering,Correlation clustering,Pattern recognition,Hierarchical clustering of networks,Determining the number of clusters in a data set,Brown clustering,Machine learning
Journal
abs/1301.3899
ISBN
Citations 
PageRank 
1-55860-709-9
30
4.37
References 
Authors
4
2
Name
Order
Citations
PageRank
Shivakumar Vaithyanathan12518234.40
Byron Dom22600825.93