| NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | Jan 29, 2024 | HumanEval | —Unverified | 0 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models | Jan 15, 2024 | HumanEvalLanguage Modelling | CodeCode Available | 0 |
| OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models | Jan 12, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs | Jan 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs | Jan 8, 2024 | Code GenerationDiversity | —Unverified | 0 |
| CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution | Jan 5, 2024 | HumanEvalPrediction | CodeCode Available | 4 |
| RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair | Dec 25, 2023 | HumanEvalparameter-efficient fine-tuning | CodeCode Available | 1 |
| Instruction Fusion: Advancing Prompt Evolution through Hybridization | Dec 25, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | Dec 20, 2023 | Code GenerationHumanEval | CodeCode Available | 2 |
| A Review of Repository Level Prompting for LLMs | Dec 15, 2023 | Code CompletionCode Generation | —Unverified | 0 |
| Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data | Dec 5, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| Magicoder: Empowering Code Generation with OSS-Instruct | Dec 4, 2023 | Code GenerationHumanEval | CodeCode Available | 4 |
| Past as a Guide: Leveraging Retrospective Learning for Python Code Completion | Nov 13, 2023 | Code CompletionHumanEval | —Unverified | 0 |
| Rethinking Benchmark and Contamination for Language Models with Rephrased Samples | Nov 8, 2023 | HumanEvalMMLU | CodeCode Available | 2 |
| Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | Oct 28, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion | Oct 17, 2023 | Code CompletionHumanEval | CodeCode Available | 1 |
| Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation | Oct 16, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules | Oct 13, 2023 | Code GenerationHumanEval | CodeCode Available | 1 |
| CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model | Oct 10, 2023 | Code GenerationCode Translation | —Unverified | 0 |
| The Program Testing Ability of Large Language Models for Code | Oct 9, 2023 | HumanEvalmbpp | —Unverified | 0 |
| Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | Oct 6, 2023 | Code GenerationDecision Making | CodeCode Available | 2 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency | Sep 29, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 |
| LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression | Sep 25, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| Baichuan 2: Open Large-scale Language Models | Sep 19, 2023 | Feature EngineeringGSM8K | CodeCode Available | 4 |
| Can Programming Languages Boost Each Other via Instruction Tuning? | Aug 31, 2023 | HumanEval | CodeCode Available | 0 |
| Code Llama: Open Foundation Models for Code | Aug 24, 2023 | 16kCode Generation | CodeCode Available | 6 |
| CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation | Aug 17, 2023 | Code GenerationFew-Shot Learning | —Unverified | 0 |
| OctoPack: Instruction Tuning Code Large Language Models | Aug 14, 2023 | Code GenerationCode Repair | CodeCode Available | 3 |
| ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation | Aug 3, 2023 | Class-level Code GenerationCode Generation | CodeCode Available | 1 |
| PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | Jul 27, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| Predicting Code Coverage without Execution | Jul 25, 2023 | HumanEval | CodeCode Available | 1 |
| Textbooks Are All You Need | Jun 20, 2023 | AllCode Generation | —Unverified | 0 |
| Is Self-Repair a Silver Bullet for Code Generation? | Jun 16, 2023 | Code GenerationHumanEval | CodeCode Available | 1 |
| WizardCoder: Empowering Code Large Language Models with Evol-Instruct | Jun 14, 2023 | Code GenerationHumanEval | CodeCode Available | 5 |
| Large Language Models of Code Fail at Completing Code with Potential Bugs | Jun 6, 2023 | Code CompletionHumanEval | CodeCode Available | 0 |
| SelfEvolve: A Code Evolution Framework via Large Language Models | Jun 5, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| ANPL: Towards Natural Programming with Interactive Decomposition | May 29, 2023 | ARCCode Generation | CodeCode Available | 1 |
| LeTI: Learning to Generate from Textual Interactions | May 17, 2023 | Code GenerationEvent Argument Extraction | CodeCode Available | 1 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 |
| Structured Chain-of-Thought Prompting for Code Generation | May 11, 2023 | Code GenerationHumanEval | —Unverified | 0 |
| StarCoder: may the source be with you! | May 9, 2023 | 8kCode Generation | CodeCode Available | 5 |
| Self-Edit: Fault-Aware Code Editor for Code Generation | May 6, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | May 2, 2023 | Code GenerationHumanEval | CodeCode Available | 3 |
| Using Large Language Models to Generate JUnit Tests: An Empirical Study | Apr 30, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| Stochastic Code Generation | Apr 14, 2023 | Code GenerationDecoder | —Unverified | 0 |
| CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X | Mar 30, 2023 | BenchmarkingCode Generation | CodeCode Available | 5 |
| Reflexion: Language Agents with Verbal Reinforcement Learning | Mar 20, 2023 | Decision MakingHumanEval | CodeCode Available | 4 |