Title
Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach
Abstract
Context: Software clustering is a key technique that is used in reverse engineering to recover a high-level abstraction of the software in the case of limited resources. Very limited research has explicitly discussed the problem of finding the optimum set of clusters in the design and how to penalize for the formation of singleton clusters during clustering. Objective: This paper attempts to enhance the existing agglomerative clustering algorithms by introducing a complementary mechanism. To solve the architecture recovery problem, the proposed approach focuses on minimizing redundant effort and penalizing for the formation of singleton clusters during clustering while maintaining the integrity of the results. Method: An automated solution for cutting a dendrogram that is based on least-squares regression is presented in order to find the best cut level. A dendrogram is a tree diagram that shows the taxonomic relationships of clusters of software entities. Moreover, a factor to penalize clusters that will form singletons is introduced in this paper. Simulations were performed on two open-source projects. The proposed approach was compared against the exhaustive and highest gap dendrogram cutting methods, as well as two well-known cluster validity indices, namely, Dunn's index and the Davies-Bouldin index. Results: When comparing our clustering results against the original package diagram, our approach achieved an average accuracy rate of 90.07% from two simulations after the utility classes were removed. The utility classes in the source code affect the accuracy of the software clustering, owing to its omnipresent behavior. The proposed approach also successfully penalized the formation of singleton clusters during clustering. Conclusion: The evaluation indicates that the proposed approach can enhance the quality of the clustering results by guiding software maintainers through the cutting point selection process. The proposed approach can be used as a complementary mechanism to improve the effectiveness of existing clustering algorithms.
Year
DOI
Venue
2013
10.1016/j.infsof.2013.07.002
Information & Software Technology
Keywords
Field
DocType
highest gap dendrogram,efficient software,complementary mechanism,software entity,guiding software maintainers,clustering result,utility class,clustering algorithm,software clustering,singleton cluster,preventive dendrogram,software maintenance
Hierarchical clustering,Data mining,Fuzzy clustering,CURE data clustering algorithm,Correlation clustering,Computer science,Constrained clustering,Artificial intelligence,Cluster analysis,Brown clustering,Machine learning,Single-linkage clustering
Journal
Volume
Issue
ISSN
55
11
0950-5849
Citations 
PageRank 
References 
11
0.46
28
Authors
3
Name
Order
Citations
PageRank
Chun Yong Chong1142.93
Sai Peck Lee214222.55
Teck Chaw Ling3312.92