Title
A rough set-based hybrid method to text categorization
Abstract
In this paper we present a hybrid text categorization method based on Rough Sets theory. A central problem in good text classification for information filtering and retrieval (IF/IR) is the high dimensionality of the data. It may contain many unnecessary and irrelevant features. To cope with this problem, we propose a hybrid technique using Latent Semantic Indexing (LSI) and Rough Sets theory (RS) to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. Besides, we generate several knowledge base instead of one knowledge base for the classification of new object, hoping that the combination of answers of the multiple knowledge bases result in better performance. Multiple knowledge bases can be formulated precisely and in a unified way within the framework of RS. This paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm, Latent Semantic indexing (LSI) with Rough Set-based rule generate algorithm, and provides experimental results. The test results show the hybrid method is better than the previous rough set-based approach.
Year
DOI
Venue
2001
10.1109/WISE.2001.996486
WISE
Keywords
Field
DocType
hybrid method,rough set theory,latent semantic indexing,rough set-based rule,proposed technique,knowledge base,rough sets theory,information retrieval,information filtering,text classification,hybrid technique,text documents,multiple knowledge base,hybrid text categorization,text categorization,multiple knowledge bases result,hybrid text categorization method,classification,rough set-based hybrid method,rule based,generic algorithm,rough set
Training set,Data mining,Latent semantic indexing,Information retrieval,Computer science,Filter (signal processing),Curse of dimensionality,Rough set,Knowledge base,Text categorization
Conference
Volume
ISBN
Citations 
1
0-7695-1393-X
11
PageRank 
References 
Authors
0.92
7
5
Name
Order
Citations
PageRank
Yongguang Bao17210.70
Satoshi Aoyama2448.57
K. Yamada3123.32
Naohiro Ishii4461128.62
Xiaoyong Du5882123.29