Text classification in a hierarchical mixture model for small training sets - Citegraph

Paper Info

Title
Text classification in a hierarchical mixture model for small training sets

Abstract
Documents are commonly categorized into hierarchies of topics, such as the ones maintained by Yahoo! and the Open Directory project, in order to facilitate browsing and other interactive forms of information retrieval. In addition, topic hierarchies can be utilized to overcome the sparseness problem in text categorization with a large number of categories, which is the main focus of this paper. This paper presents a hierarchical mixture model which extends the standard naive Bayes classifier and previous hierarchical approaches. Improved estimates of the term distributions are made by differentiation of words in the hierarchy according to their level of generality/specificity. Experiments on the Newsgroups and the Reuters-21578 dataset indicate improved performance of the proposed classifier in comparison to other state-of-the-art methods on datasets with a small number of positive examples.

Year	DOI	Venue
2001	10.1145/502585.502604	CIKM
Keywords	Field	DocType
information retrieval,open directory project,text classification,large number,improved estimate,standard naive bayes classifier,hierarchical mixture model,reuters-21578 dataset,small number,proposed classifier,small training set,previous hierarchical approach,naive bayes classifier,mixture model	Small number,Data mining,Directory,Computer science,Artificial intelligence,Hierarchy,Classifier (linguistics),Generality,Document classification,Naive Bayes classifier,Information retrieval,Mixture model,Machine learning	Conference
ISBN	Citations	PageRank
1-58113-436-3	37	2.95
References	Authors
12	4

Authors (4 rows)

Cited by (37 rows)

References (12 rows)

Name	Order	Citations	PageRank
Kristina Toutanova	1	3131	152.55
Francine Chen	2	1218	153.96
Kris Popat	3	212	23.08
Thomas Hofmann	4	10064	1001.83

1