Title
Can I clone this piece of code here?
Abstract
While code cloning is a convenient way for developers to reuse existing code, it may potentially lead to negative impacts, such as degrading code quality or increasing maintenance costs. Actually, some cloned code pieces are viewed as harmless since they evolve independently, while some other cloned code pieces are viewed as harmful since they need to be changed consistently, thus incurring extra maintenance costs. Recent studies demonstrate that neither the percentage of harmful code clones nor that of harmless code clones is negligible. To assist developers in leveraging the benefits of harmless code cloning and/or in avoiding the negative impacts of harmful code cloning, we propose a novel approach that automatically predicts the harmfulness of a code cloning operation at the point of performing copy-and-paste. Our insight is that the potential harmfulness of a code cloning operation may relate to some characteristics of the code to be cloned and the characteristics of its context. Based on a number of features extracted from the cloned code and the context of the code cloning operation, we use Bayesian Networks, a machine-learning technique, to predict the harmfulness of an intended code cloning operation. We evaluated our approach on two large-scale industrial software projects under two usage scenarios: 1) approving only cloning operations predicted to be very likely of no harm, and 2) blocking only cloning operations predicted to be very likely of harm. In the first scenario, our approach is able to approve more than 50% cloning operations with a precision higher than 94.9% in both subjects. In the second scenario, our approach is able to avoid more than 48% of the harmful cloning operations by blocking only 15% of the cloning operations for the first subject, and avoid more than 67% of the cloning operations by blocking only 34% of the cloning operations for the second subject.
Year
DOI
Venue
2012
10.1145/2351676.2351701
ASE
Keywords
Field
DocType
belief networks,software reuse,copy-and-paste operation,intended code,cloning operation,harmful code clone,code cloning,machine learning technique,large-scale industrial software project,harmfulness prediction,programming aid,existing code,bayesian networks,degrading code quality,harmless code cloning,maintenance cost,software reusability,harmful code cloning,harmless code clone,code piece,code quality degradation,bayesian network,feature extraction,machine learning
Computer science,Computer security,Reuse,Bayesian network,Industrial software,Cloning,Software quality,Code (cryptography)
Conference
ISSN
ISBN
Citations 
1527-1366
978-1-4503-1204-2
19
PageRank 
References 
Authors
0.57
22
6
Name
Order
Citations
PageRank
Xiaoyin Wang174929.19
Yingnong Dang253726.92
Lingming Zhang32726154.39
Dongmei Zhang41439132.94
Erica Lan5240.99
Hong Mei63535219.36