Title
A Weighted Cluster-based Chinese Text Categorization Approach: Incorporating with Word Clusters
Abstract
Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. We used hierarchical term clustering and aggregating similar terms. In order to enhance the performance, we present a modify indexing with terms in cluster. Our test collection extracted from Chinese NETNEWS, and used the Centroid-Based classifier to deal with the problems of categorization. The results had shown that term clustering is not only reducing the dimensions but also outperform than bag of words. Thus, term clustering can be applied to text classification by using any large corpus, its objective is to save times and increase the efficiency and effectiveness. In addition to performance, these clusters can be considered as conceptual knowledge base, and kept related terms of real world.
Year
DOI
Venue
2012
10.1109/IIAI-AAI.2012.63
IIAI-AAI
Keywords
Field
DocType
term clustering,latent semantic indexing,pattern clustering,centroid-based classifier,vector space model,centroid based classifier,word clusters,chinese netnews,weighted cluster-based chinese text,categorization problems,pattern classification,term phrase,conceptual knowledge base,text classification,term vectors,categorization approach,word clustering,bag of words,hierarchical term clustering,text categorization,similar term,feature selection,term vector,text analysis,information retrieval,support vector machines,testing,machine learning,accuracy,clustering algorithms
Bag-of-words model,Fuzzy clustering,Categorization,Clustering high-dimensional data,Correlation clustering,Computer science,Artificial intelligence,Vector space model,Conceptual clustering,Cluster analysis,Machine learning
Conference
ISBN
Citations 
PageRank 
978-1-4673-2719-0
2
0.36
References 
Authors
18
2
Name
Order
Citations
PageRank
Yu-Chieh Wu124723.16
Jie-Chi Yang235043.91