SOTAVerified

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

2020-02-19Findings of the Association for Computational LinguisticsCode Available1· sign in to hype

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
CodeSearchNetTransformerSmoothed BLEU-414.31Unverified
CodeSearchNetseq2seqSmoothed BLEU-413.36Unverified
CodeSearchNetCodeBERT (MLM+RTD)Smoothed BLEU-415.99Unverified
CodeSearchNetCodeBERT (MLM)Smoothed BLEU-415.55Unverified
CodeSearchNetpre-train w/ code onlySmoothed BLEU-415.15Unverified
CodeSearchNetCodeBERT (RTD)Smoothed BLEU-415.03Unverified
CodeSearchNetRoBERTaSmoothed BLEU-414.52Unverified
CodeSearchNet - Goseq2seqSmoothed BLEU-423.48Unverified
CodeSearchNet - GoCodeBERT (MLM)Smoothed BLEU-426.79Unverified
CodeSearchNet - GoCodeBERT (MLM+RTD)Smoothed BLEU-426.66Unverified
CodeSearchNet - Gopre-train w/ code onlySmoothed BLEU-426.39Unverified
CodeSearchNet - GoRoBERTaSmoothed BLEU-426.09Unverified
CodeSearchNet - GoCodeBERT (RTD)Smoothed BLEU-426.02Unverified
CodeSearchNet - JavaCodeBERT (MLM)Smoothed BLEU-413.59Unverified
CodeSearchNet - JavaCodeBERT (MLM+RTD)Smoothed BLEU-414.56Unverified
CodeSearchNet - Javaseq2seqSmoothed BLEU-411.42Unverified
CodeSearchNet - JavaTransformerSmoothed BLEU-412.57Unverified
CodeSearchNet - JavaCodeBERT (RTD)Smoothed BLEU-412.72Unverified
CodeSearchNet - Javapre-train w/ code onlySmoothed BLEU-413.07Unverified
CodeSearchNet - JavaRoBERTaSmoothed BLEU-413.2Unverified
CodeSearchNet - JavaScriptCodeBERT (MLM)Smoothed BLEU-48.51Unverified
CodeSearchNet - JavaScriptTransformerSmoothed BLEU-425.61Unverified
CodeSearchNet - JavaScriptCodeBERT (MLM+RTD)Smoothed BLEU-49.54Unverified
CodeSearchNet - JavaScriptCodeBERT (RTD)Smoothed BLEU-48.73Unverified
CodeSearchNet - JavaScriptpre-train w/ code onlySmoothed BLEU-48.3Unverified
CodeSearchNet - JavaScriptseq2seqSmoothed BLEU-46.88Unverified
CodeSearchNet - JavaScriptRoBERTaSmoothed BLEU-45.72Unverified
CodeSearchNet - PhpTransformerSmoothed BLEU-418.25Unverified
CodeSearchNet - PhpCodeBERT (MLM+RTD)Smoothed BLEU-421.32Unverified
CodeSearchNet - PhpCodeBERT (MLM)Smoothed BLEU-421Unverified
CodeSearchNet - Phppre-train w/ code onlySmoothed BLEU-420.71Unverified
CodeSearchNet - PhpCodeBERT (RTD)Smoothed BLEU-420.25Unverified
CodeSearchNet - PhpRoBERTaSmoothed BLEU-419.9Unverified
CodeSearchNet - Phpseq2seqSmoothed BLEU-418.4Unverified
CodeSearchNet - PythonCodeBERT (MLM+RTD)Smoothed BLEU-415.41Unverified
CodeSearchNet - Pythonseq2seqSmoothed BLEU-413.04Unverified
CodeSearchNet - PythonTransformerSmoothed BLEU-413.44Unverified
CodeSearchNet - PythonRoBERTaSmoothed BLEU-414.92Unverified
CodeSearchNet - Pythonpre-train w/ code onlySmoothed BLEU-415.12Unverified
CodeSearchNet - PythonCodeBERT (MLM)Smoothed BLEU-415.48Unverified
CodeSearchNet - RubyRoBERTaSmoothed BLEU-47.26Unverified
CodeSearchNet - Rubypre-train w/ code onlySmoothed BLEU-47.36Unverified
CodeSearchNet - Rubyseq2seqSmoothed BLEU-46.96Unverified
CodeSearchNet - RubyCodeBERT (MLM+RTD)Smoothed BLEU-48.46Unverified
CodeSearchNet - RubyCodeBERT (MLM)Smoothed BLEU-47.95Unverified
CodeSearchNet - RubyTransformerSmoothed BLEU-47.87Unverified

Reproductions