Title
Combining supervised term-weighting metrics for SVM text classification with extended term representation.
Abstract
The accuracy of a text classification method based on a SVM learner depends on the weighting metric used in order to assign a weight to a term. Weighting metrics can be classified as supervised or unsupervised according to whether they use prior information on the number of documents belonging to each category. A supervised metric should be highly informative about the relation of a document term to a category, and discriminative in separating the positive documents from the negative documents for this category. In this paper, we propose 80 metrics never used for the term-weighting problem and compare them to 16 functions of the literature. A large number of these metrics were initially proposed for other data mining problems: feature selection, classification rules and term collocations. While many previous works have shown the merits of using a particular metric, our experience suggests that the results obtained by such metrics can be highly dependent on the label distribution on the corpus and on the performance measures used (microaveraged or macroaveraged $$F_1$$F1-Score). The solution that we propose consists in combining the metrics in order to improve the classification. More precisely, we show that using a SVM classifier which combines the outputs of SVM classifiers that utilize different metrics performs well in all situations. The second main contribution of this paper is an extended term representation for the vector space model that improves significantly the prediction of the text classifier.
Year
DOI
Venue
2016
10.1007/s10115-016-0924-1
Knowledge and Information Systems
Keywords
Field
DocType
Text classification, Term weighting, Text representation, Support vector machines, Classifier combination
Data mining,Weighting,Pattern recognition,Feature selection,Computer science,Support vector machine,Artificial intelligence,Svm classifier,Vector space model,Classifier (linguistics),Discriminative model,Machine learning
Journal
Volume
Issue
ISSN
49
3
0219-3116
Citations 
PageRank 
References 
10
0.48
23
Authors
4
Name
Order
Citations
PageRank
Mounia Haddoud1181.66
Aïcha Mokhtari24611.97
Thierry Lecroq366258.52
Saïd Abdeddaïm4100.48