Title
A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses.
Abstract
Most existing feature selection approach is limited to determine features from a single source of data. In this paper, a feature selection approach is proposed to consider multiple sources of textual data. The proposed GBFS approach is then applied to label Quranic verses based on two major references, the English translation and tafsir (Commentary). The verses were selected from two chapters, Surah Al-Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. The textual data from the translation and commentary were preprocessed using StringToWord Vector with weighted TF-IDF. Feature selection algorithms: information gain, chi square, Pearson correlation coefficient, relief, and correlation-based were experimented on four classifiers: naive Bayes, libSVM, k-NN, and decision trees (J48). The proposed group-based feature selection approach has shown promising results in terms of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC) by achieving Accuracy of 94.5% and AUC of 0.944.
Year
DOI
Venue
2018
10.1007/978-3-319-72550-5_28
RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2018)
Keywords
Field
DocType
Holy Quran,Text classification,Feature selection techniques,K nearest neighbor,Support vector machine,Naive Bayes,Decision trees
k-nearest neighbors algorithm,Decision tree,Pearson product-moment correlation coefficient,Receiver operating characteristic,Feature selection,Naive Bayes classifier,Computer science,Support vector machine,C4.5 algorithm,Artificial intelligence,Natural language processing,Machine learning
Conference
Volume
ISSN
Citations 
700
2194-5357
0
PageRank 
References 
Authors
0.34
5
4
Name
Order
Citations
PageRank
Abdullahi O. Adeleke100.34
Noor Azah Samsudin2154.54
Aida Mustapha39026.18
Nazri Mohd Nawi415822.90