Title
A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization
Abstract
Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space.
Year
DOI
Venue
2010
10.1007/978-3-642-14306-9_67
Communications in Computer and Information Science
Keywords
Field
DocType
Feature Reduction,Back-Propagation Neural Network,Arabic Text Categorization,DF,TFIDF,Latent Semantic Analyses
Feature vector,Dimensionality reduction,tf–idf,Pattern recognition,Arabic,Computer science,Curse of dimensionality,Preprocessor,Natural language processing,Artificial intelligence,Text categorization,Artificial neural network
Conference
Volume
ISSN
Citations 
88
1865-0929
2
PageRank 
References 
Authors
0.38
8
3
Name
Order
Citations
PageRank
Fouzi Harrag1273.75
Eyas El-qawasmeh219320.88
Abdul Malik S. Al-Salman361.49