CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

2021-02-09Code Available1· sign in to hype

Shuai Lu, Daya Guo, Shuo Ren, JunJie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/microsoft/CodeXGLUE
OfficialIn paperpytorch★ 1,809
github.com/sberbank-ai/fusion_brain_aij2021
pytorch★ 50
github.com/kilimanj4r0/code-summarization-beyond-function-level
pytorch★ 11
github.com/Avmb/semantic_neq_game
none★ 1
github.com/deeplearnxmu/unigencoder
pytorch★ 1
github.com/yueyuel/programgen-lms-reliability
pytorch★ 1

Abstract

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

Tasks

BIG-bench Machine Learning Clone Detection Cloze Test Code Completion Code Generation Code Repair Code Search Code Summarization Code Translation Decoder Defect Detection Document Translation Text-to-Code Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CodeXGLUE - CT-all	CodeBERT(MLM)	Go	83.31	—	Unverified
CodeXGLUE - CT-maxmin	CodeBERT(MLM)	Go	90.79	—	Unverified

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Code

Abstract

Tasks

Benchmark Results

Reproductions