| QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks | Jan 20, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance | Feb 17, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Aligning CodeLLMs with Direct Preference Optimization | Oct 24, 2024 | Decision MakingHumanEval | —Unverified | 0 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 |
| VALTEST: Automated Validation of Language Model Generated Test Cases | Nov 13, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 |
| ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity | Dec 12, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| Context-Augmented Code Generation Using Programming Knowledge Graphs | Oct 9, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement | Dec 9, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization | Jun 25, 2025 | Code GenerationHumanEval | —Unverified | 0 |