| Predicting Code Coverage without Execution | Jul 25, 2023 | HumanEval | CodeCode Available | 1 | 5 |
| ANPL: Towards Natural Programming with Interactive Decomposition | May 29, 2023 | ARCCode Generation | CodeCode Available | 1 | 5 |
| Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective | Apr 11, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 | 5 |
| Self-Correcting Code Generation Using Small Language Models | May 29, 2025 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models | Jan 15, 2024 | HumanEvalLanguage Modelling | CodeCode Available | 0 | 5 |
| Self-Edit: Fault-Aware Code Editor for Code Generation | May 6, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| HumanEval on Latest GPT Models -- 2024 | Feb 20, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 | 5 |
| RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | Oct 2, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 | 5 |
| Measuring the Influence of Incorrect Code on Test Generation | Sep 14, 2024 | HumanEvalLarge Language Model | CodeCode Available | 0 | 5 |
| AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation | Oct 1, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system | Oct 28, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Using Large Language Models to Generate JUnit Tests: An Empirical Study | Apr 30, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | Oct 28, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| Evaluating How Fine-tuning on Bimodal Data Effects Code Generation | Nov 15, 2022 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency | Sep 29, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | May 12, 2025 | Code GenerationComment Generation | CodeCode Available | 0 | 5 |
| CoCoNUT: Structural Code Understanding does not fall out of a tree | Jan 27, 2025 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers | Nov 26, 2024 | HumanEvalmbpp | CodeCode Available | 0 | 5 |
| mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Oct 19, 2024 | Code GenerationDiversity | CodeCode Available | 0 | 5 |
| Multi-Programming Language Ensemble for Code Generation in Large Language Model | Sep 6, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Can Programming Languages Boost Each Other via Instruction Tuning? | Aug 31, 2023 | HumanEval | CodeCode Available | 0 | 5 |