Title
A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization
Abstract
The feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is widely used in text categorization. In this paper, we proposed a new feature selection algorithm, named CMFS, which comprehensively measures the significance of a term both in inter-category and intra-category. We evaluated CMFS on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naive Bayes (NB) and Support Vector Machines (SVMs). The experimental results, comparing CMFS with six well-known feature selection algorithms, show that the proposed method CMFS is significantly superior to Information Gain (IG), Chi statistic (CHI), Document Frequency (DF), Orthogonal Centroid Feature Selection (OCFS) and DIA association factor (DIA) when Naive Bayes classifier is used and significantly outperforms IG, DF, OCFS and DIA when Support Vector Machines are used.
Year
DOI
Venue
2012
10.1016/j.ipm.2011.12.005
Inf. Process. Manage.
Keywords
Field
DocType
feature selection,support vector machines,dia association factor,chi statistic,comprehensive measurement,new feature selection algorithm,well-known feature selection algorithm,naive bayes classifier,text categorization,document frequency,naive bayes
Data mining,Feature selection,Computer science,Artificial intelligence,Classifier (linguistics),Naive Bayes classifier,Pattern recognition,Statistic,Support vector machine,Curse of dimensionality,Statistical classification,Machine learning,Centroid
Journal
Volume
Issue
ISSN
48
4
0306-4573
Citations 
PageRank 
References 
40
1.04
27
Authors
5
Name
Order
Citations
PageRank
Jieming Yang1743.64
Yuan-Ning Liu216022.94
Xiaodong Zhu37310.24
Zhen Liu412216.50
Xiaoxu Zhang5713.60