Title
Compressed Firmware Classification Based on Extra Trees and Doc2Vec
Abstract
Firmware formats vary from vendor to vendor, making it difficult to track which vendor or device the firmware belongs to, or to identify the firmware used in an embedded device. Current firmware analysis tools mainly distinguish firmware by static signatures in the firmware binary code. However, the extraction of a signature often requires careful analysis by professionals to obtain it and requires a significant investment of time and effort. In this paper, we use Doc2Vec to extract and process the character information in firmware, combine the file size, file entropy, and the arithmetic mean of bytes as firmware features, and implement the firmware classifier by combining the Extra Trees model. The evaluation is performed on 1,190 firmware files from 5 router vendors. The accuracy of the classifier is 97.18%, which is higher than that of current approaches. The results show that the proposed approach is feasible and effective.
Year
DOI
Venue
2021
10.1155/2021/2666153
SCIENTIFIC PROGRAMMING
DocType
Volume
ISSN
Journal
2021
1058-9244
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Jing Qiu100.34
Xiaoxu Geng200.34
Guang-Lu Sun35816.03