Title
Meaningful variable names for decompiled code: a machine translation approach.
Abstract
When code is compiled, information is lost, including some of the structure of the original source code as well as local identifier names. Existing decompilers can reconstruct much of the original source code, but typically use meaningless placeholder variables for identifier names. Using variable names which are more natural in the given context can make the code much easier to interpret, despite the fact that variable names have no effect on the execution of the program. In theory, it is impossible to recover the original identifier names since that information has been lost. However, most code is natural: it is highly repetitive and predictable based on the context. In this paper we propose a technique that assigns variables meaningful names by taking advantage of this naturalness property. We consider decompiler output to be a noisy distortion of the original source code, where the original source code is transformed into the decompiler output. Using this noisy channel model, we apply standard statistical machine translation approaches to choose natural identifiers, combining a translation model trained on a parallel corpus with a language model trained on unmodified C code. We generate a large parallel corpus from 1.2 TB of C source code obtained from GitHub. Under the most conservative assumptions, our technique is still able to recover the original variable names up to 16.2% of the time, which represents a lower bound for performance.
Year
DOI
Venue
2018
10.1145/3196321.3196330
ICPC
Keywords
Field
DocType
Decompilation,Understandability,Statistical Machine Translation,Renaming Identifiers
Data mining,Identifier,Source code,Computer science,Machine translation,Natural language,Natural language processing,Noisy channel model,Artificial intelligence,Program comprehension,Decompiler,Language model
Conference
ISSN
ISBN
Citations 
1092-8138
978-1-4503-5714-2
4
PageRank 
References 
Authors
0.42
28
5
Name
Order
Citations
PageRank
Alan Jaffe151.45
Jeremy Lacomis2102.55
Edward J. Schwartz353723.29
Claire Le Goues4176668.79
Bogdan Vasilescu593548.75