Exploiting Hierarchy in Text Categorization - Citegraph

Paper Info

Title
Exploiting Hierarchy in Text Categorization

Abstract
With the recent dramatic increase in electronic access todocuments, text categorization—the task of assigning topics to agiven document—has moved to the center of the information sciencesand knowledge management. This article uses the structure that ispresent in the semantic space of topics in order to improveperformance in text categorization: according to their meaning,topics can be grouped together into “meta-topics”, e.g., gold,silver, and copper are all metals. The proposed architecture matchesthe hierarchical structure of the topic space, as opposed to a flatmodel that ignores the structure. It accommodates both single andmultiple topic assignments for each document. Its probabilisticinterpretation allows its predictions to be combined in a principledway with information from other sources. The first level of thearchitecture predicts the probabilities of the meta-topic groups.This allows the individual models for each topic on the second levelto focus on finer discriminations within the group. Evaluating theperformance of a two-level implementation on the Reuters-22173testbed of newswire articles shows the most significant improvementfor rare classes.

Year	DOI	Venue
1999	10.1023/A:1009983522080	Inf. Retr.
Keywords	DocType	Volume
information retrieval,text mining,topic spotting,text categorization,knowledge management,problem decomposition,machine learning,neural networks,probabilistic models,hierarchical models,performance evaluation	Journal	1
Issue	ISSN	Citations
3	1573-7659	89
PageRank	References	Authors
8.58	14	3

Authors (3 rows)

Cited by (89 rows)

References (14 rows)

Name	Order	Citations	PageRank
Andreas S. Weigend	1	576	112.30
Erik D. Wiener	2	89	8.58
Jan O. Pedersen	3	6301	1177.07

1