Title
BYTEWEIGHT: learning to recognize functions in binary code
Abstract
Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2,200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2,200 binaries, we found that BYTE-WEIGHT missed 44,621 functions in comparison with the 266,672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459,247 functions, BYTEWEIGHT misidentified only 43,992 functions.
Year
Venue
Field
2014
USENIX Security
Binary rewriting,Computer science,A priori and a posteriori,Reverse engineering,Binary code,Control-flow integrity,Compiler,Theoretical computer science,Program analysis,Binary number
DocType
Citations 
PageRank 
Conference
39
1.22
References 
Authors
21
5
Name
Order
Citations
PageRank
Tiffany Bao1648.17
Jonathan Burket2391.22
Maverick Woo31737.47
Rafael Turner4391.22
David Brumley52940142.75