Title
Terms-based discriminative information space for robust text classification.
Abstract
With the popularity of Web 2.0, there has been a phenomenal increase in the utility of text classification in applications like document filtering and sentiment categorization. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. In this paper, we propose a novel and efficient method using terms-based discriminative information space for robust text classification. Terms in the documents are assigned weights according to the discrimination information they provide for one category over the others. These weights also serve to partition the terms into category sets. A linear opinion pool is adopted for combining the discrimination information provided by each set of terms to yield a feature space (discriminative information space) having dimensions equal to the number of classes. Subsequently, a discriminant function is learned to categorize the documents in the feature space. This classification methodology relies upon corpus information only, and is robust to distribution shifts and noise. We develop theoretical parallels of our methodology with generative, discriminative, and hybrid classifiers. We evaluate our methodology extensively with five different discriminative term weighting schemes on six data sets from different application areas. We give a side-by-side comparison with four well-known text classification techniques. The results show that our methodology consistently outperforms the rest, especially when there is a distribution shift from training to test sets. Moreover, our methodology is simple and effective for different application domains and training set sizes. It is also fast with a small and tunable memory footprint.
Year
DOI
Venue
2016
10.1016/j.ins.2016.08.073
Information Sciences
Keywords
Field
DocType
Text classification,Discriminative term weights,Linear opinion pooling,Feature construction
Categorization,Feature vector,Data set,Weighting,Pattern recognition,Artificial intelligence,Information space,Memory footprint,Discriminative model,Machine learning,Mathematics,Discriminant function analysis
Journal
Volume
Issue
ISSN
372
C
0020-0255
Citations 
PageRank 
References 
1
0.35
50
Authors
4
Name
Order
Citations
PageRank
Khurum Nazir Junejo1576.08
Asim Karim213618.04
Malik Tahir Hassan3194.77
Moongu Jeon445672.81