Abstract
Source code (Context) and its parsed abstract syntax tree (AST; Structure) are two complementary representations of the same computer program. Traditionally, designers of machine learning models have relied predominantly either on Structure or Context. We propose a new model, which jointly learns on Context and Structure of source code. In contrast to previous approaches, our model uses only language-agnostic features, i.e., source code and features that can be computed directly from the AST. Besides obtaining state-of-the-art on monolingual code summarization on all five programming languages considered in this work, we propose the first multilingual code summarization model. We show that jointly training on non-parallel data from multiple programming languages improves results on all individual languages, where the strongest gains are on low-resource languages. Remarkably, multilingual training only from Context does not lead to the same improvements, highlighting the benefits of combining Structure and Context for representation learning on code.
| Original language | English |
|---|---|
| State | Published - 2021 |
| Event | 9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online, Austria Duration: 3 May 2021 → 7 May 2021 |
Conference
| Conference | 9th International Conference on Learning Representations, ICLR 2021 |
|---|---|
| Country/Territory | Austria |
| City | Virtual, Online |
| Period | 3/05/21 → 7/05/21 |
Fingerprint
Dive into the research topics of 'LANGUAGE-AGNOSTIC REPRESENTATION LEARNING OF SOURCE CODE FROM STRUCTURE AND CONTEXT'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver