Assessing the Generalizability of Code2vec Token Embeddings. - Citegraph

Paper Info

Title
Assessing the Generalizability of Code2vec Token Embeddings.

Abstract
Many Natural Language Processing (NLP) tasks, such as sentiment analysis or syntactic parsing, have benefited from the development of word embedding models. In particular, regardless of the training algorithms, the learned embeddings have often been shown to be generalizable to different NLP tasks. In contrast, despite recent momentum on word embeddings for source code, the literature lacks evidence of their generalizability beyond the example task they have been trained for. In this experience paper, we identify 3 potential downstream tasks, namely code comments generation, code authorship identification, and code clones detection, that source code token embedding models can be applied to. We empirically assess a recently proposed code token embedding model, namely code2vec's token embeddings. Code2vec was trained on the task of predicting method names, and while there is potential for using the vectors it learns on other tasks, it has not been explored in literature. Therefore, we fill this gap by focusing on its generalizability for the tasks we have identified. Eventually, we show that source code token embeddings cannot be readily leveraged for the downstream tasks. Our experiments even show that our attempts to use them do not result in any improvements over less sophisticated methods. We call for more research into effective and general use of code embeddings.

Year	DOI	Venue
2019	10.1109/ASE.2019.00011	ASE
Keywords	Field	DocType
Code Embeddings, Distributed Representations, Big Code	Generalizability theory,Big code,Syntactic parsing,Embedding,Sentiment analysis,Computer science,Source code,Theoretical computer science,Artificial intelligence,Natural language processing,Word embedding,Security token	Conference
ISSN	ISBN	Citations
1938-4300	978-1-7281-2508-4	5
PageRank	References	Authors
0.39	0	3

Authors (3 rows)

Cited by (5 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hong Jin Kang	1	6	3.10
Tegawendé F. Bissyandé	2	863	63.90
David Lo	3	5346	259.67

1