| Rethinking Repetition Problems of LLMs in Code Generation | May 15, 2025 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| RLTF: Reinforcement Learning from Unit Test Feedback | Jul 10, 2023 | Code Generationmbpp | CodeCode Available | 1 | 5 |
| EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization | May 24, 2024 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast | May 23, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 1 | 5 |
| Unsupervised Evaluation of Code LLMs with Round-Trip Correctness | Feb 13, 2024 | HumanEvalmbpp | CodeCode Available | 1 | 5 |
| XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts | Apr 23, 2024 | HumanEvalmbpp | CodeCode Available | 1 | 5 |
| RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | Oct 2, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective | Apr 11, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| CodePAD: Sequence-based Code Generation with Pushdown Automaton | Nov 2, 2022 | Code Generationmbpp | CodeCode Available | 0 | 5 |
| FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system | Oct 28, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Teaching Large Language Models to Self-Debug | Apr 11, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 0 | 5 |
| Self-Correcting Code Generation Using Small Language Models | May 29, 2025 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Instruction Fusion: Advancing Prompt Evolution through Hybridization | Dec 25, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Underwater Object Tracker: UOSTrack for Marine Organism Grasping of Underwater Vehicles | Jan 4, 2023 | Data Augmentationmbpp | CodeCode Available | 0 | 5 |
| Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency | Sep 29, 2023 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation | Oct 1, 2024 | Code GenerationHumanEval | CodeCode Available | 0 | 5 |
| Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers | Nov 26, 2024 | HumanEvalmbpp | CodeCode Available | 0 | 5 |
| Textbooks Are All You Need | Jun 20, 2023 | AllCode Generation | —Unverified | 0 | 0 |
| LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | Mar 12, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer | Aug 19, 2024 | Code GenerationCross-Lingual Transfer | —Unverified | 0 | 0 |
| Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation | Oct 16, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding | Sep 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| The Program Testing Ability of Large Language Models for Code | Oct 9, 2023 | HumanEvalmbpp | —Unverified | 0 | 0 |
| The Stack: 3 TB of permissively licensed source code | Nov 20, 2022 | HumanEvalmbpp | —Unverified | 0 | 0 |
| Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision | Feb 5, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Brevity is the soul of wit: Pruning long files for code generation | Jun 29, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 | 0 |
| Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement | Dec 30, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | Apr 5, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs | Jan 8, 2024 | Code GenerationDiversity | —Unverified | 0 | 0 |
| AceCoder: Utilizing Existing Code to Enhance Code Generation | Mar 31, 2023 | Code Generationmbpp | —Unverified | 0 | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 | 0 |
| Type-Constrained Code Generation with Language Models | Apr 12, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases | Jun 11, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents | Mar 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting | May 25, 2024 | Contrastive Learningmbpp | —Unverified | 0 | 0 |
| Prompt Baking | Sep 4, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | Jun 20, 2024 | GSM8KHeuristic Search | —Unverified | 0 | 0 |
| QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks | Jan 20, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance | Feb 17, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Aligning CodeLLMs with Direct Preference Optimization | Oct 24, 2024 | Decision MakingHumanEval | —Unverified | 0 | 0 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 | 0 |
| VALTEST: Automated Validation of Language Model Generated Test Cases | Nov 13, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 | 0 |
| ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity | Dec 12, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Context-Augmented Code Generation Using Programming Knowledge Graphs | Oct 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement | Dec 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization | Jun 25, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| ACECODER: Acing Coder RL via Automated Test-Case Synthesis | Feb 3, 2025 | HumanEvalmbpp | —Unverified | 0 | 0 |