Abstract | ||
---|---|---|
Clone detection techniques have been explored for decades. Recently, deep learning techniques has been adopted to improve the code representation capability, and improve the state-of-the-art in code clone detection. These approaches usually require a transformation from AST to binary tree to incorporate syntactical information, which introduces overheads. Moreover, these approaches conduct term-embedding, which requires large training datasets. In this paper, we introduce a tree embedding technique to conduct clone detection. Our approach first conducts tree embedding to obtain a node vector for each intermediate node in the AST, which captures the structure information of ASTs. Then we compose a tree vector from its involving node vectors using a lightweight method. Lastly Euclidean distances between tree vectors are measured to determine code clones. We implement our approach in a tool called TECCD and conduct an evaluation using the BigCloneBench (BCB) and 7 other large scale Java projects. The results show that our approach achieves good accuracy and recall and outperforms existing approaches. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICSME.2019.00025 | 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) |
Keywords | Field | DocType |
code clone detection,AST,Skip-gram | Tree embedding,Data mining,Systems engineering,Computer science,Binary tree,Artificial intelligence,Deep learning,Euclidean geometry,Java,Code clone | Conference |
ISSN | ISBN | Citations |
1063-6773 | 978-1-7281-3095-8 | 3 |
PageRank | References | Authors |
0.37 | 28 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yi Gao | 1 | 3 | 0.37 |
Zan Wang | 2 | 12 | 1.53 |
Shuang Liu | 3 | 36 | 22.95 |
Lin Yang | 4 | 3 | 0.37 |
Wei Sang | 5 | 3 | 0.37 |
Yuanfang Cai | 6 | 1169 | 76.99 |