Abstract | ||
---|---|---|
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied.
In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3236024.3236085 | ESEC/SIGSOFT FSE |
Keywords | DocType | Volume |
Word Embeddings,Analogical Reasoning,Program Understanding,Linux | Conference | abs/1803.06686 |
ISBN | Citations | PageRank |
978-1-4503-5573-5 | 6 | 0.42 |
References | Authors | |
37 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jordan Henkel | 1 | 6 | 1.78 |
Shuvendu K. Lahiri | 2 | 1424 | 68.18 |
Ben Liblit | 3 | 1209 | 74.47 |
Thomas W. Reps | 4 | 39 | 5.67 |