Title
TECCD: A Tree Embedding Approach for Code Clone Detection
Abstract
Clone detection techniques have been explored for decades. Recently, deep learning techniques has been adopted to improve the code representation capability, and improve the state-of-the-art in code clone detection. These approaches usually require a transformation from AST to binary tree to incorporate syntactical information, which introduces overheads. Moreover, these approaches conduct term-embedding, which requires large training datasets. In this paper, we introduce a tree embedding technique to conduct clone detection. Our approach first conducts tree embedding to obtain a node vector for each intermediate node in the AST, which captures the structure information of ASTs. Then we compose a tree vector from its involving node vectors using a lightweight method. Lastly Euclidean distances between tree vectors are measured to determine code clones. We implement our approach in a tool called TECCD and conduct an evaluation using the BigCloneBench (BCB) and 7 other large scale Java projects. The results show that our approach achieves good accuracy and recall and outperforms existing approaches.
Year
DOI
Venue
2019
10.1109/ICSME.2019.00025
2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)
Keywords
Field
DocType
code clone detection,AST,Skip-gram
Tree embedding,Data mining,Systems engineering,Computer science,Binary tree,Artificial intelligence,Deep learning,Euclidean geometry,Java,Code clone
Conference
ISSN
ISBN
Citations 
1063-6773
978-1-7281-3095-8
3
PageRank 
References 
Authors
0.37
28
6
Name
Order
Citations
PageRank
Yi Gao130.37
Zan Wang2121.53
Shuang Liu33622.95
Lin Yang430.37
Wei Sang530.37
Yuanfang Cai6116976.99