SOTAVerified

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

2020-04-20ACL 2020Code Available1· sign in to hype

Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, Graham Neubig

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
CoNaLaExternal Knowledge With API + RerankingBLEU32.26Unverified
CoNaLaExternal Knowledge With APIBLEU30.69Unverified
CoNaLa-ExtExternal Knowledge With API + RerankingBLEU20.54Unverified
CoNaLa-ExtExternal Knowledge With APIBLEU20.37Unverified

Reproductions