Title
Jackdaw: Towards Automatic Reverse Engineering of Large Datasets of Binaries
Abstract
When analyzing an untrusted binary, reverse engineers usually rely on ad-hoc collections of interesting dynamic patterns--known as behaviors in the malware-analysis community--and static patterns--known as signatures in the antivirus community. Such patterns are often part of the skill set of the analyst, sometimes implemented in manually-created post-processing scripts. It would be desirable to be able to automatically find such behaviors, present them to analysts, and create a systematic catalog of matching rules and relevant implementations. We propose Jackdaw, a system that finds interesting dynamic patterns, and ranks them to unveil potentially interesting behaviors. Then, it annotates them with static information, capturing the distinct implementations of each across different malware families. Finally, Jackdaw associates semantic information to the behaviors, so as to create a descriptive summary that helps the analysts in querying the catalog of behaviors by type. To do this, it leverages the dynamic information and an indexed Web-based knowledge databases. We implement and demonstrate Jackdaw on the Win32 API even if the technique can be generalized to any OS. On a dataset of 2,136 distinct binaries, including both malicious and benign libraries and executables, we compared the behaviors extracted automatically against a ground truth of 44 behaviors created manually by expert analysts. Jackdaw found 77.3﾿% of them and was able to exclude spurious behaviors in 99.6﾿% cases. We also discovered 466 novel behaviors, among which manual exploration and review by expert reverse engineers revealed interesting findings and confirmed the correctness of the semantic tagging.
Year
DOI
Venue
2015
10.1007/978-3-319-20550-2_7
Detection of Intrusions and Malware & Vulnerability Assessment
Field
DocType
Citations 
Data mining,Information retrieval,Control flow graph,Computer science,Reverse engineering,Correctness,Implementation,Ground truth,Malware,Scripting language,Executable
Conference
5
PageRank 
References 
Authors
0.41
34
4
Name
Order
Citations
PageRank
Mario Polino11126.94
Andrea Scorti250.41
Federico Maggi352437.68
Stefano Zanero473653.78