Title
Text classification in a hierarchical mixture model for small training sets
Abstract
Documents are commonly categorized into hierarchies of topics, such as the ones maintained by Yahoo! and the Open Directory project, in order to facilitate browsing and other interactive forms of information retrieval. In addition, topic hierarchies can be utilized to overcome the sparseness problem in text categorization with a large number of categories, which is the main focus of this paper. This paper presents a hierarchical mixture model which extends the standard naive Bayes classifier and previous hierarchical approaches. Improved estimates of the term distributions are made by differentiation of words in the hierarchy according to their level of generality/specificity. Experiments on the Newsgroups and the Reuters-21578 dataset indicate improved performance of the proposed classifier in comparison to other state-of-the-art methods on datasets with a small number of positive examples.
Year
DOI
Venue
2001
10.1145/502585.502604
CIKM
Keywords
Field
DocType
information retrieval,open directory project,text classification,large number,improved estimate,standard naive bayes classifier,hierarchical mixture model,reuters-21578 dataset,small number,proposed classifier,small training set,previous hierarchical approach,naive bayes classifier,mixture model
Small number,Data mining,Directory,Computer science,Artificial intelligence,Hierarchy,Classifier (linguistics),Generality,Document classification,Naive Bayes classifier,Information retrieval,Mixture model,Machine learning
Conference
ISBN
Citations 
PageRank 
1-58113-436-3
37
2.95
References 
Authors
12
4
Name
Order
Citations
PageRank
Kristina Toutanova13131152.55
Francine Chen21218153.96
Kris Popat321223.08
Thomas Hofmann4100641001.83