Title
An automatically constructed thesaurus for neural network based document categorization
Abstract
This paper presents a method for computing a thesaurus from a text corpus, and combined with a revised back-propagation neural network (BPNN) learning algorithm for document categorization. Automatically constructed thesaurus is a data structure that accomplished by extracting the relatedness between words. Neural network is one of the efficient approaches for document categorization. However the conventional BPNN has the problems of slow learning and easy to involve into the local minimum. We use a revised algorithm to improve the conventional BPNN that can overcome these problems. A well constructed thesaurus has been recognized as valuable tool in the effective operation of document categorization, it overcome some problem for the document categorization based on bag of words which ignored the relationship between words. To investigate the effectiveness of our method, we conducted the experiments on the standard Reuter-21578. The experimental results show that the proposed model was able to achieve higher categorization effectiveness as measured by the precision, recall and F-measure.
Year
DOI
Venue
2009
10.1016/j.eswa.2009.02.006
Expert Syst. Appl.
Keywords
Field
DocType
efficient approach,higher categorization effectiveness,slow learning,neural network,neural networks,effective operation,automatically constructed thesaurus,revised back-propagation neural network,revised algorithm,data structure,conventional bpnn,document categorization,bag of words
Bag-of-words model,Data structure,Categorization,Data mining,Boosting methods for object categorization,Computer science,Text corpus,Artificial intelligence,Artificial neural network,Recall,Machine learning
Journal
Volume
Issue
ISSN
36
8
Expert Systems With Applications
Citations 
PageRank 
References 
5
0.40
28
Authors
3
Name
Order
Citations
PageRank
Cheng Hua Li119712.83
Wei Song211315.51
Soon Cheol Park319714.78