Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces. - Citegraph

Paper Info

Title
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces.

Abstract
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions.

Year	DOI	Venue
2018	10.1145/3236024.3236085	ESEC/SIGSOFT FSE
Keywords	DocType	Volume
Word Embeddings,Analogical Reasoning,Program Understanding,Linux	Conference	abs/1803.06686
ISBN	Citations	PageRank
978-1-4503-5573-5	6	0.42
References	Authors
37	4

Authors (4 rows)

Cited by (6 rows)

References (37 rows)

Name	Order	Citations	PageRank
Jordan Henkel	1	6	1.78
Shuvendu K. Lahiri	2	1424	68.18
Ben Liblit	3	1209	74.47
Thomas W. Reps	4	39	5.67

1