Abstract | ||
---|---|---|
Documents are commonly categorized into hierarchies of topics, such as the ones maintained by Yahoo! and the Open Directory project, in order to facilitate browsing and other interactive forms of information retrieval. In addition, topic hierarchies can be utilized to overcome the sparseness problem in text categorization with a large number of categories, which is the main focus of this paper. This paper presents a hierarchical mixture model which extends the standard naive Bayes classifier and previous hierarchical approaches. Improved estimates of the term distributions are made by differentiation of words in the hierarchy according to their level of generality/specificity. Experiments on the Newsgroups and the Reuters-21578 dataset indicate improved performance of the proposed classifier in comparison to other state-of-the-art methods on datasets with a small number of positive examples. |
Year | DOI | Venue |
---|---|---|
2001 | 10.1145/502585.502604 | CIKM |
Keywords | Field | DocType |
information retrieval,open directory project,text classification,large number,improved estimate,standard naive bayes classifier,hierarchical mixture model,reuters-21578 dataset,small number,proposed classifier,small training set,previous hierarchical approach,naive bayes classifier,mixture model | Small number,Data mining,Directory,Computer science,Artificial intelligence,Hierarchy,Classifier (linguistics),Generality,Document classification,Naive Bayes classifier,Information retrieval,Mixture model,Machine learning | Conference |
ISBN | Citations | PageRank |
1-58113-436-3 | 37 | 2.95 |
References | Authors | |
12 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kristina Toutanova | 1 | 3131 | 152.55 |
Francine Chen | 2 | 1218 | 153.96 |
Kris Popat | 3 | 212 | 23.08 |
Thomas Hofmann | 4 | 10064 | 1001.83 |