Code Generation
Code Generation is an important field to predict explicit code or program structure from multimodal data sources such as incomplete code, programs in another programming language, natural language descriptions or execution examples. Code Generation tools can assist the development of automatic programming tools to improve programming productivity.
Source: Deep Learning for Source Code Modeling and Generation
Image source: Measuring Coding Challenge Competence With APPS
Papers
Showing 1–10 of 1697 papers
All datasetsMBPPAPPSCoNaLaDjangoWikiSQLRES-QCodeContestsHumanEvalPECCWebApp1K-ReactCoNaLa-ExtWebApp1k-Duo-React
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | EG-CFG (DeepSeek-V3-0324) | Accuracy | 96.6 | — | Unverified |
| 2 | QualityFlow (Sonnet-3.5) | Accuracy | 94.2 | — | Unverified |
| 3 | o1-mini + MapCoder (Hamming.ai) | Accuracy | 93.2 | — | Unverified |
| 4 | MGDebugger (DeepSeek-V3-0324) | Accuracy | 92.4 | — | Unverified |
| 5 | GPT-4 + AgentCoder | Accuracy | 91.8 | — | Unverified |
| 6 | CodeSim (GPT4o) | Accuracy | 90.7 | — | Unverified |
| 7 | Jiutian-大模型 | Accuracy | 90 | — | Unverified |
| 8 | GPT-3.5 Turbo (ChatGPT) + AgentCoder | Accuracy | 89.9 | — | Unverified |
| 9 | MapCoder (GPT-4o) | Accuracy | 89.7 | — | Unverified |
| 10 | GPT-4 (ChatGPT Plus) | Accuracy | 87.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LPW (GPT-4o) | Introductory Pass@1 | 87.2 | — | Unverified |
| 2 | MoTCoder-32B-V1.5 | Introductory Pass@1 | 68.44 | — | Unverified |
| 3 | MoTCoder-7B-V1.5 | Introductory Pass@1 | 54.26 | — | Unverified |
| 4 | code-davinci-002 175B (CodeT) | Introductory Pass@1 | 47.3 | — | Unverified |
| 5 | deepseek-ai/deepseek-coder-6.7b-instruct | Introductory Pass@1 | 33.8 | — | Unverified |
| 6 | code-davinci-002 175B | Introductory Pass@1 | 31.92 | — | Unverified |
| 7 | CodeChain+WizardCoder-15b | Introductory Pass@1 | 29.3 | — | Unverified |
| 8 | WizardCoder-15b | Introductory Pass@1 | 26.29 | — | Unverified |
| 9 | CodeSim (GPT4) | Introductory Pass@1 | 26.04 | — | Unverified |
| 10 | AlphaCode 1B Filtered from 50000 | Competition Pass@any | 22 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PanGu-Coder-FT-I | BLEU | 44.32 | — | Unverified |
| 2 | RoBERTaMarian | BLEU | 35.74 | — | Unverified |
| 3 | MarianCG | BLEU | 34.43 | — | Unverified |
| 4 | TranX + BERT w/mined | BLEU | 34.2 | — | Unverified |
| 5 | BERT + TAE | BLEU | 33.41 | — | Unverified |
| 6 | BERTMarian | BLEU | 32.46 | — | Unverified |
| 7 | External Knowledge With API + Reranking | BLEU | 32.26 | — | Unverified |
| 8 | External Knowledge With API | BLEU | 30.69 | — | Unverified |
| 9 | BART W/ Mined | BLEU | 30.55 | — | Unverified |
| 10 | ELECTRAMarian | BLEU | 30.18 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MarianCG | Accuracy | 81.83 | — | Unverified |
| 2 | BERT + TAE | Accuracy | 81.03 | — | Unverified |
| 3 | TranX + BERT w/mined | Accuracy | 81.03 | — | Unverified |
| 4 | Reranker | Accuracy | 80.2 | — | Unverified |
| 5 | LUKEMarian | Accuracy | 78.5 | — | Unverified |
| 6 | RoBERTaMarian | Accuracy | 77.95 | — | Unverified |
| 7 | BERTMarian | Accuracy | 76.68 | — | Unverified |
| 8 | Tranx | Accuracy | 73.7 | — | Unverified |
| 9 | ELECTRAMarian | Accuracy | 65.32 | — | Unverified |
| 10 | lpn (Ling et al., 2016) | Accuracy | 62.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | NL2SQL-RULE | Execution Accuracy | 89.2 | — | Unverified |
| 2 | TypeSQL+TC (Yu et al., 2018)+ | Execution Accuracy | 82.6 | — | Unverified |
| 3 | Tranx | Execution Accuracy | 78.6 | — | Unverified |
| 4 | STAMP+RL (Sun et al., 2018)+ | Execution Accuracy | 74.6 | — | Unverified |
| 5 | STAMP (Sun et al., 2018)+ | Execution Accuracy | 74.4 | — | Unverified |
| 6 | TypeSQL (Yu et al., 2018) | Execution Accuracy | 73.5 | — | Unverified |
| 7 | PT-MAML (Huang et al., 2018) | Execution Accuracy | 68 | — | Unverified |
| 8 | Bidirectional Attention for SQL Generation | Execution Accuracy | 62.5 | — | Unverified |
| 9 | Seq2SQL (Zhong et al., 2017) | Execution Accuracy | 59.4 | — | Unverified |
| 10 | Seq2Seq (Zhong et al., 2017) | Execution Accuracy | 35.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | QurrentOS-coder + Claude 3.5 Sonnet | pass@1 | 58 | — | Unverified |
| 2 | QurrentOS-coder + GPT-4o | pass@1 | 46 | — | Unverified |
| 3 | QurrentOS-coder + GPT-4 Turbo | pass@1 | 37 | — | Unverified |
| 4 | QurrentOS-coder + Claude 3 Opus | pass@1 | 36 | — | Unverified |
| 5 | QurrentOS-coder + Gemini 1.5 Pro | pass@1 | 30 | — | Unverified |
| 6 | QurrentOS-coder + GPT-4 | pass@1 | 30 | — | Unverified |
| 7 | QurrentOS-coder + DeepSeek-Coder-V2 | pass@1 | 29 | — | Unverified |
| 8 | QurrentOS-coder + Llama 3 70b | pass@1 | 20 | — | Unverified |
| 9 | QurrentOS-coder + Qwen-72B-Instruct | pass@1 | 18 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | EG-CFG (DeepSeek-V3-0324) | Test Set pass@1 | 58.18 | — | Unverified |
| 2 | LPW (GPT-4o) | Test Set pass@1 | 34.7 | — | Unverified |
| 3 | MapCoder (GPT-4) | Test Set pass@1 | 28.5 | — | Unverified |
| 4 | CodeSim (GPT4) | Test Set pass@1 | 28.4 | — | Unverified |
| 5 | MoTCoder-15B | Test Set pass@1 | 26.34 | — | Unverified |
| 6 | MoTCoder-7B-v1.5 | Test Set pass@1 | 20.77 | — | Unverified |
| 7 | CodeChain + WizardCoder-15B | Test Set pass@1 | 2.35 | — | Unverified |
| 8 | WizardCoder-15B | Test Set pass@1 | 1.11 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DeepSeek-R1 (MGDebugger) | Pass@1 | 100 | — | Unverified |
| 2 | LLaMA 3 | Pass@1 | 99.4 | — | Unverified |
| 3 | QualityFlow (Sonnet-3.5) | Pass@1 | 98.8 | — | Unverified |
| 4 | Phi-2 | Pass@1 | 98.2 | — | Unverified |
| 5 | EG-CFG (DeepSeek-V3-0324) | Pass@1 | 96.95 | — | Unverified |
| 6 | Mistral 7B | Pass@1 | 93.9 | — | Unverified |
| 7 | Claude Sonnet 3.5 | Pass@1 | 90.85 | — | Unverified |
| 8 | L2MAC (GPT-4) | Pass@1 | 90.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Claude 3 Haiku | Pass@3 | 27.67 | — | Unverified |
| 2 | GPT-3.5 Turbo | Pass@3 | 23.75 | — | Unverified |
| 3 | codechat-bison | Pass@3 | 11.39 | — | Unverified |
| 4 | chat-bison | Pass@3 | 8.48 | — | Unverified |
| 5 | Mixtral-8x7B-Instruct | Pass@3 | 8.35 | — | Unverified |
| 6 | Phi-3-mini-128k-instruct | Pass@3 | 7.18 | — | Unverified |
| 7 | WizardLM-2-7B | Pass@3 | 3.72 | — | Unverified |
| 8 | Llama-3-8B-Instruct | Pass@3 | 3.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | o1-preview | pass@1 | 0.95 | — | Unverified |
| 2 | o1-mini | pass@1 | 0.94 | — | Unverified |
| 3 | gpt-4o-2024-08-06 | pass@1 | 0.89 | — | Unverified |
| 4 | claude-3.5-sonnet | pass@1 | 0.88 | — | Unverified |
| 5 | deepseek-v2.5 | pass@1 | 0.83 | — | Unverified |
| 6 | mistral-large-2 | pass@1 | 0.78 | — | Unverified |
| 7 | deepseek-coder-v2-instruct | pass@1 | 0.7 | — | Unverified |
| 8 | llama-v3p1-405b-instruct | pass@1 | 0.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BART W/ Mined | BLEU | 35.32 | — | Unverified |
| 2 | BART Base | BLEU | 34.35 | — | Unverified |
| 3 | External Knowledge With API + Reranking | BLEU | 20.54 | — | Unverified |
| 4 | External Knowledge With API | BLEU | 20.37 | — | Unverified |
| 5 | Reranker | BLEU | 19.85 | — | Unverified |
| 6 | TranX | BLEU | 18.85 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | claude-3-5-sonnet | pass@1 | 0.68 | — | Unverified |
| 2 | o1-mini | pass@1 | 0.67 | — | Unverified |