Abstract | ||
---|---|---|
Source code analysis to detect code cloning, code plagiarism, and code reuse suffers from the problem of pervasive code modifications, i.e. transformations that may have a global effect. We compare 30 similarity detection techniques and tools against pervasive code modifications. We evaluate the tools using two experimental scenarios for Java source code. These are (1) pervasive modifications created with tools for source code and bytecode obfuscation and (2) source code normalisation through compilation and decompilation using different decompilers. Our experimental results show that highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures. Our study strongly validates the use of compilation/decompilation as a normalisation technique. Its use reduced false classifications to zero for six of the tools. This broad, thorough study is the largest in existence and potentially an invaluable guide for future users of similarity detection in source code. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/SCAM.2016.13 | 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM) |
Keywords | Field | DocType |
source code similarity,decompilation,code normalisation,code cloning,code reuse,code plagiarism | Codebase,Static program analysis,Programming language,Source code,Computer science,Code generation,Theoretical computer science,KPI-driven code analysis,Code reuse,Bytecode,Code review | Conference |
ISSN | ISBN | Citations |
1942-5430 | 978-1-5090-3849-7 | 4 |
PageRank | References | Authors |
0.43 | 28 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chaiyong Ragkhitwetsagul | 1 | 4 | 1.11 |
Jens Krinke | 2 | 1533 | 76.35 |
David M. Clark | 3 | 153 | 16.33 |