Title | ||
---|---|---|
A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization |
Abstract | ||
---|---|---|
Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1007/978-3-642-14306-9_67 | Communications in Computer and Information Science |
Keywords | Field | DocType |
Feature Reduction,Back-Propagation Neural Network,Arabic Text Categorization,DF,TFIDF,Latent Semantic Analyses | Feature vector,Dimensionality reduction,tf–idf,Pattern recognition,Arabic,Computer science,Curse of dimensionality,Preprocessor,Natural language processing,Artificial intelligence,Text categorization,Artificial neural network | Conference |
Volume | ISSN | Citations |
88 | 1865-0929 | 2 |
PageRank | References | Authors |
0.38 | 8 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fouzi Harrag | 1 | 27 | 3.75 |
Eyas El-qawasmeh | 2 | 193 | 20.88 |
Abdul Malik S. Al-Salman | 3 | 6 | 1.49 |