Title
An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification
Abstract
Stemming is one of the most significant preprocessing. stages in text categorization that most of the academic investigators aim to improve and optimize the accuracy of the classification task. High dimensionality of feature space is one of the challenges in text classification that can be decreased by many techniques. In stemming, high dimensionality of feature space is decreased by grouping those words that they have same grammatical forms and then getting their root. This work is dedicated to build an approach for Kurdish language classification using Reber Stemmer. Thus, an innovative approach is investigated to get the stem of words in Kurdish language by removing longest suffix and prefixes of words. This approach has a strong capability and meets the requirements in responding to the process of deleting as many of the required affixes as possible to get the stem of words in Kurdish language. The advantage of this stemmer is that it ignores the ordering list of affixes that receives correct stem for more than one words that have the same format. The stemming technique is implemented on KDC-4007 dataset that consists of eight classes. Support Vector Machine (SVM) and Decision Tree (DT or C 4.5) are used for the classification. This stemmer has been successfully compared with the Longest-Match stemmer technique. According to results, the F-measure of Reber stemmer and Longest-Match method in SVM is higher than DT. Reber stemmer in SVM for classes (religion, sport, health and education) obtained higher F-measure, while the rest of classes are lower in Longest-Match. Reber stemmer in DT for classes (religion, sport and art) had higher F-measure for Reber stemmer while in Longest match the rest of classes showed lower F-measure.
Year
DOI
Venue
2018
10.1007/s42044-018-0007-4
Iran Journal of Computer Science
Keywords
DocType
Volume
Kurdish text classification, Stemming, Support vector machine, Decision tree
Journal
1
Issue
ISSN
Citations 
2
2520-8446
0
PageRank 
References 
Authors
0.34
1
6