Toward Optimal Feature Selection in Naive Bayes for Text Categorization. - Citegraph

Paper Info

Title
Toward Optimal Feature Selection in Naive Bayes for Text Categorization.

Abstract
Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ($MD$ ) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.

Year	DOI	Venue
2016	10.1109/TKDE.2016.2563436	IEEE Trans. Knowl. Data Eng.
Keywords	DocType	Volume
Feature selection,Jeffreys divergence,Jeffreys-Multi-Hypothesis divergence,Kullback-Leibler divergence,text categorization	Journal	abs/1602.02850
Issue	ISSN	Citations
9	1041-4347	38
PageRank	References	Authors
1.08	41	3

Authors (3 rows)

Cited by (38 rows)

References (41 rows)

Name	Order	Citations	PageRank
Bo Tang	1	163	16.29
S. Kay	2	309	40.73
Haibo He	3	3653	213.96

1