Abstract | ||
---|---|---|
With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction. |
Year | DOI | Venue |
---|---|---|
2018 | 10.3745/JIPS.02.0080 | JOURNAL OF INFORMATION PROCESSING SYSTEMS |
Keywords | DocType | Volume |
Classification Accuracy, Classification Efficiency, Data Reduction, Instance Selection, k-Nearest Neighbors, Text Categorization | Journal | 14 |
Issue | ISSN | Citations |
2 | 1976-913X | 0 |
PageRank | References | Authors |
0.34 | 0 | 1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fatiha Barigou | 1 | 14 | 6.76 |