Title | ||
---|---|---|
Automatic Document Clustering Based On Keyword Clusters Using Partitions Of Weighted Diagraphs |
Abstract | ||
---|---|---|
This paper proposes a new document clustering approach using the method of partitioning weighted directional graphs (digraphs). First, natural language processing and feature selection techniques are utilized to remove the words that are useless for document clustering. Then, only useful keywords are extracted and the association strengths between them are computed, which can greatly reduce time and space complexities of the clustering algorithm. After that, the extracted keywords are treated as the nodes and the association strengths are used as the weights in the arcs from some keywords to their associated ones. Therefore, a weighted digraph is constructed. The strongly connected components in the keyword digraph are explored heuristically. These components represent the keyword clusters of the document collection. Based on the keyword clusters, each document is Clustered according to the similarity of the keywords among the documents and each of the keyword Clusters. it is revealed from the experiments that using keyword clusters in automatic document clustering can result in high clustering precision rate. |
Year | Venue | Keywords |
---|---|---|
2004 | COMPUTER SYSTEMS SCIENCE AND ENGINEERING | document clustering, keyword clustering, weighted directional graphs, information retrieval |
Field | DocType | Volume |
Cluster (physics),Data mining,Fuzzy clustering,Correlation clustering,Document clustering,Computer science,Cluster analysis,Database,Single-linkage clustering | Journal | 19 |
Issue | ISSN | Citations |
1 | 0267-6192 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hsi-Cheng Chang | 1 | 24 | 2.38 |
Chiun-chieh Hsu | 2 | 0 | 0.68 |
Chi-kai Chan | 3 | 0 | 0.34 |