| Dynamic Scaling of Unit Tests for Code Reward Modeling | Jan 2, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Structured Chain-of-Thought Prompting for Code Generation | May 11, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach | May 29, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Evaluating Large Language Models for Code Review | May 26, 2025 | HumanEval | —Unverified | 0 | 0 |
| Reasoning Runtime Behavior of a Program with LLM: How Far Are We? | Mar 25, 2024 | HumanEval | —Unverified | 0 | 0 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 | 0 |
| Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Dec 17, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation? | May 24, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models | Mar 10, 2025 | HumanEvalProgram Synthesis | —Unverified | 0 | 0 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 | 0 |