Title
Design and Implementation of Chinese Text Clustering System
Abstract
Clustering technology is the core technology of text mining. Through text clustering, a large number of text messages can be divided into several meaningful classes or clusters. According to the features of Chinese documents, this paper designs and implements the Chinese Text Clustering System to perform automatic clustering of Chinese documents. Firstly, this system will carry out Chinese word automatic segmentation for the input Chinese document sets by using reverse maximum matching method. Secondly, further text preprocessing is performed. Finally the K-means clustering algorithm is used to obtain the clustering results. The prototype system can also be used in clustering Chinese Web pages to search for user's interest model by search engines, which will improve the efficiency of searching the target content.
Year
DOI
Venue
2009
10.1109/NCM.2009.234
NCM
Keywords
Field
DocType
clustering result,chinese word segmentation,chinese web pages clustering,pattern clustering,input chinese document set,reverse maximum matching,chinese document,k-means clustering algorithm,chinese text clustering system,internet,chinese word automatic segmentation,data mining,k-means algorithm,text analysis,text preprocessing,chinese word,chinese text clustering,search engines,text mining,automatic clustering,text message,text clustering,chinese web page,clustering algorithms,k means clustering,k means algorithm,k means clustering algorithm,maximum matching,accuracy,search engine,web pages
Canopy clustering algorithm,Fuzzy clustering,Data mining,Clustering high-dimensional data,Data stream clustering,Correlation clustering,Information retrieval,Computer science,Conceptual clustering,Brown clustering,Cluster analysis
Conference
ISBN
Citations 
PageRank 
978-0-7695-3769-6
1
0.38
References 
Authors
2
4
Name
Order
Citations
PageRank
Ying Tan110.38
Huang Lan21013.31
Hong Qi3413.41
Yandong Zhai4122.44