| Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement | Dec 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference | Dec 25, 2024 | CPUGPU | —Unverified | 0 |
| Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models | Dec 18, 2024 | HumanEvalImitation Learning | —Unverified | 0 |
| PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation | Dec 17, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Dec 17, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Learning to Reason via Self-Iterative Process Feedback for Small Language Models | Dec 11, 2024 | Domain GeneralizationGSM8K | —Unverified | 0 |
| AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement | Dec 9, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Does Few-Shot Learning Help LLM Performance in Code Synthesis? | Dec 3, 2024 | Code GenerationFew-Shot Learning | —Unverified | 0 |
| Addressing Data Leakage in HumanEval Using Combinatorial Test Design | Dec 2, 2024 | HumanEval | —Unverified | 0 |
| Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers | Nov 26, 2024 | HumanEvalmbpp | CodeCode Available | 0 |
| A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks | Nov 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs | Nov 20, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| VALTEST: Automated Validation of Language Model Generated Test Cases | Nov 13, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 |
| Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models | Nov 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | Nov 7, 2024 | Code GenerationDecision Making | —Unverified | 0 |
| InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation | Nov 1, 2024 | Code TranslationHumanEval | CodeCode Available | 0 |
| Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models | Oct 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system | Oct 28, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Aligning CodeLLMs with Direct Preference Optimization | Oct 24, 2024 | Decision MakingHumanEval | —Unverified | 0 |
| Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment | Oct 23, 2024 | GSM8KHumanEval | —Unverified | 0 |
| MojoBench: Language Modeling and Benchmarks for Mojo | Oct 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Self-Evolving Multi-Agent Collaboration Networks for Software Development | Oct 22, 2024 | HumanEval | —Unverified | 0 |
| Scattered Forest Search: Smarter Code Space Exploration with LLMs | Oct 22, 2024 | Code GenerationDiversity | —Unverified | 0 |
| Semantic-guided Search for Efficient Program Repair with Large Language Models | Oct 22, 2024 | GPUHumanEval | —Unverified | 0 |
| Self-Explained Keywords Empower Large Language Models for Code Generation | Oct 21, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Oct 19, 2024 | Code GenerationDiversity | CodeCode Available | 0 |
| CELI: Controller-Embedded Language Model Interactions | Oct 18, 2024 | ArticlesCode Generation | —Unverified | 0 |
| G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Oct 15, 2024 | HumanEvalLanguage Modelling | —Unverified | 0 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 |
| KV Prediction for Improved Time to First Token | Oct 10, 2024 | Code CompletionCPU | —Unverified | 0 |
| Context-Augmented Code Generation Using Programming Knowledge Graphs | Oct 9, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| AIME: AI System Optimization via Multiple LLM Evaluators | Oct 4, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | Oct 2, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation | Oct 1, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity | Sep 24, 2024 | Code GenerationContrastive Learning | —Unverified | 0 |
| GRIN: GRadient-INformed MoE | Sep 18, 2024 | HellaSwagHumanEval | —Unverified | 0 |
| RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation | Sep 15, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Measuring the Influence of Incorrect Code on Test Generation | Sep 14, 2024 | HumanEvalLarge Language Model | CodeCode Available | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 |
| USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding | Sep 9, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Multi-Programming Language Ensemble for Code Generation in Large Language Model | Sep 6, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining | Sep 3, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution | Aug 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | Aug 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| AutoTest: Evolutionary Code Solution Selection with Test Cases | Aug 22, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs | Aug 18, 2024 | DiversityGPU | —Unverified | 0 |
| Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting | Aug 18, 2024 | HumanEvalMathematical Reasoning | —Unverified | 0 |
| CodeMirage: Hallucinations in Code Generated by Large Language Models | Aug 14, 2024 | Code GenerationHallucination | —Unverified | 0 |
| CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding | Aug 8, 2024 | HumanEvalRetrieval | —Unverified | 0 |
| TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models | Jul 30, 2024 | BenchmarkingCode Completion | —Unverified | 0 |