Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

2020-04-20ACL 2020Code Available1· sign in to hype

Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, Graham Neubig

Code Available — Be the first to reproduce this paper.

Code

github.com/neulab/external-knowledge-codegen
OfficialIn paperpytorch★ 97
github.com/zorazrw/multilingual-conala
pytorch★ 23

Abstract

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.

Tasks

Code Generation Data Augmentation Retrieval

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CoNaLa	External Knowledge With API + Reranking	BLEU	32.26	—	Unverified
CoNaLa	External Knowledge With API	BLEU	30.69	—	Unverified
CoNaLa-Ext	External Knowledge With API + Reranking	BLEU	20.54	—	Unverified
CoNaLa-Ext	External Knowledge With API	BLEU	20.37	—	Unverified

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Code

Abstract

Tasks

Benchmark Results

Reproductions