| Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions | Dec 20, 2022 | Automated Theorem ProvingCode Generation | CodeCode Available | 2 |
| ReCode: Robustness Evaluation of Code Generation Models | Dec 20, 2022 | Code GenerationHumanEval | CodeCode Available | 1 |
| Large Language Models Meet NL2Code: A Survey | Dec 19, 2022 | HumanEvalSurvey | —Unverified | 0 |
| The Stack: 3 TB of permissively licensed source code | Nov 20, 2022 | HumanEvalmbpp | —Unverified | 0 |
| Evaluating How Fine-tuning on Bimodal Data Effects Code Generation | Nov 15, 2022 | Code GenerationHumanEval | CodeCode Available | 0 |
| Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic? | Oct 26, 2022 | HumanEvalLanguage Modelling | —Unverified | 0 |
| Multi-lingual Evaluation of Code Generation Models | Oct 26, 2022 | Code CompletionCode Generation | CodeCode Available | 1 |
| ContraCLM: Contrastive Learning For Causal Language Model | Oct 3, 2022 | Code GenerationCode Search | CodeCode Available | 1 |
| MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation | Aug 17, 2022 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Interactive Code Generation via Test-Driven User-Intent Formalization | Aug 11, 2022 | Code GenerationHumanEval | —Unverified | 0 |
| CodeT: Code Generation with Generated Tests | Jul 21, 2022 | Code GenerationHumanEval | CodeCode Available | 2 |
| Fault-Aware Neural Code Rankers | Jun 4, 2022 | Code GenerationHumanEval | CodeCode Available | 1 |
| CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | Mar 25, 2022 | Code GenerationHumanEval | CodeCode Available | 6 |
| Evaluating Large Language Models Trained on Code | Jul 7, 2021 | Code GenerationHumanEval | CodeCode Available | 3 |