Abstract | ||
---|---|---|
Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ($MD$ ) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/TKDE.2016.2563436 | IEEE Trans. Knowl. Data Eng. |
Keywords | DocType | Volume |
Feature selection,Jeffreys divergence,Jeffreys-Multi-Hypothesis divergence,Kullback-Leibler divergence,text categorization | Journal | abs/1602.02850 |
Issue | ISSN | Citations |
9 | 1041-4347 | 38 |
PageRank | References | Authors |
1.08 | 41 | 3 |