Title
PU text classification enhanced by term frequency-inverse document frequency-improved weighting
Abstract
Term frequency-inverse document frequency TF-IDF, one of the most popular feature also called term or word weighting methods used to describe documents in the vector space model and the applications related to text mining and information retrieval, can effectively reflect the importance of the term in the collection of documents, in which all documents play the same roles. But, TF-IDF does not take into account the difference of term IDF weighting if the documents play different roles in the collection of documents, such as positive and negative training set in text classification. In view of the aforementioned text, this paper presents a novel TF-IDF-improved feature weighting approach, which reflects the importance of the term in the positive and the negative training examples, respectively. We also build a weighted voting classifier by iteratively applying the support vector machine algorithm and implement one-class support vector machine and Positive Example Based Learning methods used for comparison. During classifying, an improved 1-DNF algorithm, called 1-DNFC, is also adopted, aiming at identifying more reliable negative documents from the unlabeled examples set. The experimental results show that the performance of term frequency inverse positive-negative document frequency-based classifier outperforms that of TF-IDF-based one, and the performance of weighted voting classifier also exceeds that of one-class support vector machine-based classifier and Positive Example Based Learning-based classifier. Copyright © 2013 John Wiley & Sons, Ltd.
Year
DOI
Venue
2014
10.1002/cpe.3040
Concurrency and Computation: Practice & Experience
Keywords
Field
DocType
tf-idf,1-dnfc,classification,tfipndf,wvc
Structured support vector machine,Data mining,Weighting,tf–idf,Computer science,Support vector machine,Artificial intelligence,Vector space model,Margin classifier,Linear classifier,Machine learning,Quadratic classifier
Journal
Volume
Issue
ISSN
26
3
1532-0626
Citations 
PageRank 
References 
11
0.66
17
Authors
3
Name
Order
Citations
PageRank
Tao Peng19812.70
Lu Liu2284.39
Wanli Zuo334242.73