| Addressing Data Leakage in HumanEval Using Combinatorial Test Design | Dec 2, 2024 | HumanEval | —Unverified | 0 |
| Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers | Nov 26, 2024 | HumanEvalmbpp | CodeCode Available | 0 |
| A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks | Nov 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Planning-Driven Programming: A Large Language Model Programming Workflow | Nov 21, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs | Nov 20, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback | Nov 18, 2024 | HumanEvalmbpp | CodeCode Available | 1 |
| VALTEST: Automated Validation of Language Model Generated Test Cases | Nov 13, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models | Nov 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | Nov 7, 2024 | Code GenerationDecision Making | —Unverified | 0 |
| InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation | Nov 1, 2024 | Code TranslationHumanEval | CodeCode Available | 0 |
| SelfCodeAlign: Self-Alignment for Code Generation | Oct 31, 2024 | Code GenerationHumanEval | CodeCode Available | 3 |
| Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models | Oct 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet' | Oct 29, 2024 | Code CompletionCode Generation | CodeCode Available | 1 |
| FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system | Oct 28, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Aligning CodeLLMs with Direct Preference Optimization | Oct 24, 2024 | Decision MakingHumanEval | —Unverified | 0 |
| Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment | Oct 23, 2024 | GSM8KHumanEval | —Unverified | 0 |
| MojoBench: Language Modeling and Benchmarks for Mojo | Oct 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Scattered Forest Search: Smarter Code Space Exploration with LLMs | Oct 22, 2024 | Code GenerationDiversity | —Unverified | 0 |
| Self-Evolving Multi-Agent Collaboration Networks for Software Development | Oct 22, 2024 | HumanEval | —Unverified | 0 |
| Semantic-guided Search for Efficient Program Repair with Large Language Models | Oct 22, 2024 | GPUHumanEval | —Unverified | 0 |
| Self-Explained Keywords Empower Large Language Models for Code Generation | Oct 21, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Oct 19, 2024 | Code GenerationDiversity | CodeCode Available | 0 |
| CELI: Controller-Embedded Language Model Interactions | Oct 18, 2024 | ArticlesCode Generation | —Unverified | 0 |
| HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks | Oct 16, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 |