| Evaluating Large Language Models for Code Review | May 26, 2025 | HumanEval | —Unverified | 0 |
| Reasoning Runtime Behavior of a Program with LLM: How Far Are We? | Mar 25, 2024 | HumanEval | —Unverified | 0 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 |
| Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Dec 17, 2024 | GSM8KHumanEval | —Unverified | 0 |
| From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation? | May 24, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models | Mar 10, 2025 | HumanEvalProgram Synthesis | —Unverified | 0 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 |
| Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs? | Mar 7, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 |
| Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees | Jun 17, 2025 | Code TranslationHumanEval | —Unverified | 0 |