Title
A global evaluation criterion for feature selection in text categorization using Kullback-Leibler divergence.
Abstract
A major difficulty of text categorization is extremely high dimensionality of text feature space. The use of feature selection techniques for large-scale text categorization task is desired for improving the accuracy and efficiency. x 2 statistic and simplified x 2 are two effective feature selection methods in text categorization. Using these two feature selection criteria, for a term, one needs to compute the local scores of the term over each category and usually takes the maximum or average value of these scores as the global term-goodness criterion. But there is no explicit explanation on how to choose maximum or average; moreover, these two operations can not reflect the degree of scatter of a term over all categories. In this paper, we propose a new global feature evaluation criterion based on Kullback-Leibler (KL) divergence for choosing informative terms since KL divergence is a widely used method to measure the differences of distributions between two categories. We conduct experiments on Reuters-21578 corpus with k-NN classifier to test the performance of the proposed method. The experimental results show that this method enhances the performance of text categorization. The novel method is similar or better than previous maximum and average on either Macro-F1 or Micro-F1. © 2011 IEEE.
Year
DOI
Venue
2011
10.1109/SoCPaR.2011.6089284
SoCPaR
Keywords
Field
DocType
chi-square statistic,feature selection,global evaluation criterion,kullback-leibler divergence,text categorization,text analysis,feature extraction,machine learning,support vector machine,kullback leibler divergence,kullback leibler,feature space
Feature vector,Text mining,Feature selection,Pattern recognition,Statistic,Computer science,Curse of dimensionality,Feature extraction,Artificial intelligence,Classifier (linguistics),Kullback–Leibler divergence,Machine learning
Conference
Volume
Issue
Citations 
null
null
6
PageRank 
References 
Authors
0.48
22
4
Name
Order
Citations
PageRank
Zhilong Zhen161.15
Xiaoqin Zeng240732.97
Haijuan Wang360.81
Lixin Han413514.47