Title
Exploiting Hierarchy in Text Categorization
Abstract
With the recent dramatic increase in electronic access todocuments, text categorization—the task of assigning topics to agiven document—has moved to the center of the information sciencesand knowledge management. This article uses the structure that ispresent in the semantic space of topics in order to improveperformance in text categorization: according to their meaning,topics can be grouped together into “meta-topics”, e.g., gold,silver, and copper are all metals. The proposed architecture matchesthe hierarchical structure of the topic space, as opposed to a flatmodel that ignores the structure. It accommodates both single andmultiple topic assignments for each document. Its probabilisticinterpretation allows its predictions to be combined in a principledway with information from other sources. The first level of thearchitecture predicts the probabilities of the meta-topic groups.This allows the individual models for each topic on the second levelto focus on finer discriminations within the group. Evaluating theperformance of a two-level implementation on the Reuters-22173testbed of newswire articles shows the most significant improvementfor rare classes.
Year
DOI
Venue
1999
10.1023/A:1009983522080
Inf. Retr.
Keywords
DocType
Volume
information retrieval,text mining,topic spotting,text categorization,knowledge management,problem decomposition,machine learning,neural networks,probabilistic models,hierarchical models,performance evaluation
Journal
1
Issue
ISSN
Citations 
3
1573-7659
89
PageRank 
References 
Authors
8.58
14
3
Name
Order
Citations
PageRank
Andreas S. Weigend1576112.30
Erik D. Wiener2898.58
Jan O. Pedersen363011177.07