Title
A Text Categorization Method using Extended Vector Space Model by Frequent Term Sets.
Abstract
Text categorization is one of the most important research topics in Natural Language Processing and Information Retrieval due to the ever-increasing electronic documents. This paper presents a new text categorization method using frequent term sets. A novel constraint measure AD-Sup was introduced to extract discriminative features from frequent term sets for classification task. Then text documents are represented in the global feature space which contains both single terms and frequent term sets. To solve the sparse instance problem, a term weighting strategy is then implemented which assigns estimated weights using feature similarity and highly reduces the sparse rate. Through extensive experiments, the optimal proportion of single features and frequent term set features is empirically determined. Classification results on Reuters-21578 and WebKB corpus demonstrate that AD-Sup constraint is effective to extract useful frequent features and the combination strategy is effective to build better feature space and improve the SVM classifier.
Year
DOI
Venue
2013
null
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING
Keywords
Field
DocType
text categorization,text representation,frequent term sets,Apriori,SVM
Feature vector,Weighting,Pattern recognition,Computer science,Artificial intelligence,Svm classifier,Vector space model,Text categorization,Discriminative model,Machine learning
Journal
Volume
Issue
ISSN
29
SP1
1016-2364
Citations 
PageRank 
References 
5
0.45
17
Authors
3
Name
Order
Citations
PageRank
Man Yuan150.45
Yuanxin Ouyang212121.57
Zhang Xiong31069102.45