| AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection | May 12, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| AutoTest: Evolutionary Code Solution Selection with Test Cases | Aug 22, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| BASS: Batched Attention-optimized Speculative Sampling | Apr 24, 2024 | GPUHumanEval | —Unverified | 0 | 0 |
| Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol | Mar 7, 2025 | BenchmarkingBug fixing | —Unverified | 0 | 0 |
| PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs | Jan 8, 2024 | Code GenerationDiversity | —Unverified | 0 | 0 |
| Brevity is the soul of wit: Pruning long files for code generation | Jun 29, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation | Oct 16, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Can LLMs Enable Verification in Mainstream Programming? | Mar 18, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| CELI: Controller-Embedded Language Model Interactions | Oct 18, 2024 | ArticlesCode Generation | —Unverified | 0 | 0 |
| CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation | Aug 17, 2023 | Code GenerationFew-Shot Learning | —Unverified | 0 | 0 |
| CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model | Oct 10, 2023 | Code GenerationCode Translation | —Unverified | 0 | 0 |
| CodeMirage: Hallucinations in Code Generated by Large Language Models | Aug 14, 2024 | Code GenerationHallucination | —Unverified | 0 | 0 |
| CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts | May 8, 2025 | Code CompletionCode Generation | —Unverified | 0 | 0 |
| Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency | Jun 18, 2024 | HumanEvalmbpp | —Unverified | 0 | 0 |
| CodeShell Technical Report | Mar 23, 2024 | 8kHumanEval | —Unverified | 0 | 0 |
| CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | Nov 7, 2024 | Code GenerationDecision Making | —Unverified | 0 | 0 |
| Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting | Aug 18, 2024 | HumanEvalMathematical Reasoning | —Unverified | 0 | 0 |
| Context-Augmented Code Generation Using Programming Knowledge Graphs | Oct 9, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 | 0 |
| CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding | Aug 8, 2024 | HumanEvalRetrieval | —Unverified | 0 | 0 |
| CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution | Aug 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Dafny as Verification-Aware Intermediate Language for Code Generation | Jan 10, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data | Dec 5, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models | Oct 30, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Discrete Flow Matching | Jul 22, 2024 | HumanEvalmbpp | —Unverified | 0 | 0 |
| Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation | May 30, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Does Few-Shot Learning Help LLM Performance in Code Synthesis? | Dec 3, 2024 | Code GenerationFew-Shot Learning | —Unverified | 0 | 0 |
| Does your data spark joy? Performance gains from domain upsampling at the end of training | Jun 5, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | Aug 23, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Dec 25, 2024 | CPUGPU | —Unverified | 0 | 0 |
| DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs | Nov 20, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Dynamic Scaling of Unit Tests for Code Reward Modeling | Jan 2, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Structured Chain-of-Thought Prompting for Code Generation | May 11, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach | May 29, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Evaluating Large Language Models for Code Review | May 26, 2025 | HumanEval | —Unverified | 0 | 0 |
| Reasoning Runtime Behavior of a Program with LLM: How Far Are We? | Mar 25, 2024 | HumanEval | —Unverified | 0 | 0 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 | 0 |
| Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Dec 17, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation? | May 24, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models | Mar 10, 2025 | HumanEvalProgram Synthesis | —Unverified | 0 | 0 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 | 0 |
| Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs? | Mar 7, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 | 0 |
| Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees | Jun 17, 2025 | Code TranslationHumanEval | —Unverified | 0 | 0 |
| Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks | Jan 11, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jun 9, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities | Jan 31, 2025 | Code GenerationHallucination | —Unverified | 0 | 0 |
| Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models | Dec 18, 2024 | HumanEvalImitation Learning | —Unverified | 0 | 0 |
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion | Jan 6, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Interactive Code Generation via Test-Driven User-Intent Formalization | Aug 11, 2022 | Code GenerationHumanEval | —Unverified | 0 | 0 |