Title
Automatic Document Clustering Based On Keyword Clusters Using Partitions Of Weighted Diagraphs
Abstract
This paper proposes a new document clustering approach using the method of partitioning weighted directional graphs (digraphs). First, natural language processing and feature selection techniques are utilized to remove the words that are useless for document clustering. Then, only useful keywords are extracted and the association strengths between them are computed, which can greatly reduce time and space complexities of the clustering algorithm. After that, the extracted keywords are treated as the nodes and the association strengths are used as the weights in the arcs from some keywords to their associated ones. Therefore, a weighted digraph is constructed. The strongly connected components in the keyword digraph are explored heuristically. These components represent the keyword clusters of the document collection. Based on the keyword clusters, each document is Clustered according to the similarity of the keywords among the documents and each of the keyword Clusters. it is revealed from the experiments that using keyword clusters in automatic document clustering can result in high clustering precision rate.
Year
Venue
Keywords
2004
COMPUTER SYSTEMS SCIENCE AND ENGINEERING
document clustering, keyword clustering, weighted directional graphs, information retrieval
Field
DocType
Volume
Cluster (physics),Data mining,Fuzzy clustering,Correlation clustering,Document clustering,Computer science,Cluster analysis,Database,Single-linkage clustering
Journal
19
Issue
ISSN
Citations 
1
0267-6192
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Hsi-Cheng Chang1242.38
Chiun-chieh Hsu200.68
Chi-kai Chan300.34