Abstract | ||
---|---|---|
Recently, tree structures have become a popular way for storing huge amount of data. Clustering these data can facilitate different operations such as storage, retrieval, rule extraction and processing. In this paper, we propose a novel and heuristic algorithm for clustering tree structured data, called TreeCluster. This algorithm considers a representative tree for each cluster. It differs significantly from the traditional methods based on computing tree edit distance. TreeCluster compares each input tree T only with the representative trees of clusters and as a result allows a significant reduction of the running time. We show the efficiency of TreeCluster in terms of time complexity. Furthermore, we empirically evaluate the effectiveness and accuracy of TreeCluster algorithm in comparison with the pervious works. Our experimental results show that TreeCluster improves some cluster quality measures such as intra-cluster similarity, inter-cluster similarity, DUNN and DB. |
Year | DOI | Venue |
---|---|---|
2007 | 10.3233/IDA-2007-11404 | Intell. Data Anal. |
Keywords | Field | DocType |
cluster quality,tree structure,TreeCluster algorithm,intra-cluster similarity,clustering tree,representative tree,input tree,inter-cluster similarity,heuristic algorithm,time complexity | Data mining,Computer science,K-ary tree,Vantage-point tree,Tree structure,Artificial intelligence,Interval tree,Tree traversal,Pattern recognition,Segment tree,Machine learning,Search tree,Incremental decision tree | Journal |
Volume | Issue | ISSN |
11 | 4 | 1088-467X |
Citations | PageRank | References |
1 | 0.35 | 30 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mostafa Haghir Chehreghani | 1 | 50 | 8.46 |
Masoud Rahgozar | 2 | 72 | 8.77 |
Caro Lucas | 3 | 1501 | 103.34 |
Morteza Haghir Chehreghani | 4 | 110 | 16.07 |