Abstract | ||
---|---|---|
Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure or Context. We propose a new model, which jointly learns on Context and Structure of source code. In contrast to previous approaches, our model uses only language-agnostic features, i.e., source code and features that can be computed directly from the AST. Besides obtaining state-of-the-art on monolingual code summarization on all five programming languages considered in this work, we propose the first multilingual code summarization model. We show that jointly training on non-parallel data from multiple programming languages improves results on all individual languages, where the strongest gains are on low-resource languages. Remarkably, multilingual training only from Context does not lead to the same improvements, highlighting the benefits of combining Structure and Context for representation learning on code. |
Year | Venue | DocType |
---|---|---|
2021 | ICLR | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daniel Zügner | 1 | 0 | 0.34 |
Tobias Kirschstein | 2 | 0 | 0.68 |
Michele Catasta | 3 | 449 | 31.64 |
Jure Leskovec | 4 | 18769 | 886.50 |
Stephan G眉nnemann | 5 | 833 | 69.26 |