Title
Detecting code clones in binary executables
Abstract
Large software projects contain significant code duplication, mainly due to copying and pasting code. Many techniques have been developed to identify duplicated code to enable applications such as refactoring, detecting bugs, and protecting intellectual property. Because source code is often unavailable, especially for third-party software, finding duplicated code in binaries becomes particularly important. However, existing techniques operate primarily on source code, and no effective tool exists for binaries. In this paper, we describe the first practical clone detection algorithm for binary executables. Our algorithm extends an existing tree similarity framework based on clustering of characteristic vectors of labeled trees with novel techniques to normalize assembly instructions and to accurately and compactly model their structural information. We have implemented our technique and evaluated it on Windows XP system binaries totaling over 50 million assembly instructions. Results show that it is both scalable and precise: it analyzed Windows XP system binaries in a few hours and produced few false positives. We believe our technique is a practical, enabling technology for many applications dealing with binary code.
Year
DOI
Venue
2009
10.1145/1572272.1572287
ISSTA
Keywords
Field
DocType
binary executables,assembly instruction,binary code,large software project,significant code duplication,pasting code,source code,windows xp system binary,detecting code clone,windows xp system,existing tree similarity framework,intellectual property,false positive
Static program analysis,Source code,Computer science,Code generation,Theoretical computer science,Real-time computing,Redundant code,Legacy code,Code refactoring,Executable,Dead code
Conference
Citations 
PageRank 
References 
69
2.24
20
Authors
5
Name
Order
Citations
PageRank
Andreas Sæbjørnsen1873.53
Jeremiah Willcock21909.73
Thomas Panas3873.39
Daniel Quinlan41398.27
Zhendong Su53397175.76