A Text Categorization Method using Extended Vector Space Model by Frequent Term Sets. - Citegraph

Paper Info

Title
A Text Categorization Method using Extended Vector Space Model by Frequent Term Sets.

Abstract
Text categorization is one of the most important research topics in Natural Language Processing and Information Retrieval due to the ever-increasing electronic documents. This paper presents a new text categorization method using frequent term sets. A novel constraint measure AD-Sup was introduced to extract discriminative features from frequent term sets for classification task. Then text documents are represented in the global feature space which contains both single terms and frequent term sets. To solve the sparse instance problem, a term weighting strategy is then implemented which assigns estimated weights using feature similarity and highly reduces the sparse rate. Through extensive experiments, the optimal proportion of single features and frequent term set features is empirically determined. Classification results on Reuters-21578 and WebKB corpus demonstrate that AD-Sup constraint is effective to extract useful frequent features and the combination strategy is effective to build better feature space and improve the SVM classifier.

Year	DOI	Venue
2013	null	JOURNAL OF INFORMATION SCIENCE AND ENGINEERING
Keywords	Field	DocType
text categorization,text representation,frequent term sets,Apriori,SVM	Feature vector,Weighting,Pattern recognition,Computer science,Artificial intelligence,Svm classifier,Vector space model,Text categorization,Discriminative model,Machine learning	Journal
Volume	Issue	ISSN
29	SP1	1016-2364
Citations	PageRank	References
5	0.45	17
Authors
3

Authors (3 rows)

Cited by (5 rows)

References (17 rows)

Name	Order	Citations	PageRank
Man Yuan	1	5	0.45
Yuanxin Ouyang	2	121	21.57
Zhang Xiong	3	1069	102.45

1