Title
An Effective Method To Improve kNN Text Classifier
Abstract
Many of standard classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many applications. As a simple, effective categorization method, kNN is widely used, but it suffers from biased data sets, too. In developing the Prototype of Internet Information Security for Shanghai Council of Information and Security, we detect that when training data set is biased, almost all test documents of some rare categories are classified into common ones. To alleviate such a misfortune, we propose a novel concept, critical point (CP), and adapt traditional kNN by integrating CP's approximate value, LB or UB, training number with decision rules. Exhaustive experiments illustrate that the adapted kNN achieves significant classification performance improvement on biased corpora.
Year
DOI
Venue
2007
10.1109/SNPD.2007.296
SNPD (1)
Keywords
Field
DocType
traditional knn,standard classification,unbalanced data,significant classification performance improvement,training number,improve knn text classifier,shanghai council,knn,training data set,text analysis,classification algorithms,training example,internet information security,text classifier,critical point,effective method,decision rule,information security
Decision rule,Data mining,Categorization,Data set,Computer science,Information security,Artificial intelligence,Statistical classification,Classifier (linguistics),Machine learning,Performance improvement,The Internet
Conference
Volume
ISBN
Citations 
1
978-0-7695-2909-7
4
PageRank 
References 
Authors
0.51
18
4
Name
Order
Citations
PageRank
Xiulan Hao1223.91
Xiaopeng Tao2264.14
Chenghong Zhang311618.03
Yunfa Hu47413.44