Open Vocabulary Learning on Source Code with a Graph-Structured Cache

2018-10-18ICLR 2019Code Available0· sign in to hype

Milan Cvitkovic, Badal Singh, Anima Anandkumar

Code Available — Be the first to reproduce this paper.

Code

github.com/mwcvitkovic/Deep_Learning_On_Code_With_A_Graph_Vocabulary--Code_Preprocessor
OfficialIn papernone★ 0
github.com/mwcvitkovic/open-vocabulary-learning-on-source-code-with-a-graph-structured-cache--code-preprocessor
none★ 0
github.com/Microsoft/graph-based-code-modelling
tf★ 0

Abstract

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over 100\% relative improvement on the latter --- at the cost of a moderate increase in computation time.

Tasks

Code Completion Graph Neural Network

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Code

Abstract

Tasks

Reproductions