| Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision | Feb 5, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Brevity is the soul of wit: Pruning long files for code generation | Jun 29, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 | 0 |
| Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement | Dec 30, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | Apr 5, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs | Jan 8, 2024 | Code GenerationDiversity | —Unverified | 0 | 0 |
| AceCoder: Utilizing Existing Code to Enhance Code Generation | Mar 31, 2023 | Code Generationmbpp | —Unverified | 0 | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 | 0 |
| Type-Constrained Code Generation with Language Models | Apr 12, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases | Jun 11, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents | Mar 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting | May 25, 2024 | Contrastive Learningmbpp | —Unverified | 0 | 0 |
| Prompt Baking | Sep 4, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | Jun 20, 2024 | GSM8KHeuristic Search | —Unverified | 0 | 0 |
| QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks | Jan 20, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance | Feb 17, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Aligning CodeLLMs with Direct Preference Optimization | Oct 24, 2024 | Decision MakingHumanEval | —Unverified | 0 | 0 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 | 0 |
| VALTEST: Automated Validation of Language Model Generated Test Cases | Nov 13, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 | 0 |
| ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity | Dec 12, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Context-Augmented Code Generation Using Programming Knowledge Graphs | Oct 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement | Dec 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization | Jun 25, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| ACECODER: Acing Coder RL via Automated Test-Case Synthesis | Feb 3, 2025 | HumanEvalmbpp | —Unverified | 0 | 0 |