| Type-Constrained Code Generation with Language Models | Apr 12, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance | Feb 17, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Validating LLM-Generated Programs with Metamorphic Prompt Testing | Jun 11, 2024 | HumanEval | —Unverified | 0 |
| VALTEST: Automated Validation of Language Model Generated Test Cases | Nov 13, 2024 | HumanEvalLanguage Modeling | —Unverified | 0 |
| SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents | Mar 23, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Large Language Models Meet NL2Code: A Survey | Dec 19, 2022 | HumanEvalSurvey | —Unverified | 0 |
| A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models | Jan 15, 2024 | HumanEvalLanguage Modelling | CodeCode Available | 0 |
| Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | May 12, 2025 | Code GenerationComment Generation | CodeCode Available | 0 |
| Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | Oct 28, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models | Jun 10, 2024 | BenchmarkingCode Generation | CodeCode Available | 0 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 |
| Multi-Programming Language Ensemble for Code Generation in Large Language Model | Sep 6, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation | Oct 19, 2024 | Code GenerationDiversity | CodeCode Available | 0 |
| Large Language Models of Code Fail at Completing Code with Potential Bugs | Jun 6, 2023 | Code CompletionHumanEval | CodeCode Available | 0 |
| Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models | Sep 27, 2023 | HumanEvalLanguage Modeling | CodeCode Available | 0 |
| Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study | Mar 22, 2024 | Code CompletionHumanEval | CodeCode Available | 0 |
| Measuring the Influence of Incorrect Code on Test Generation | Sep 14, 2024 | HumanEvalLarge Language Model | CodeCode Available | 0 |
| InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation | Nov 1, 2024 | Code TranslationHumanEval | CodeCode Available | 0 |
| CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Feb 13, 2025 | 8kGPU | CodeCode Available | 0 |
| Instruction Fusion: Advancing Prompt Evolution through Hybridization | Dec 25, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | Oct 2, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers | Nov 26, 2024 | HumanEvalmbpp | CodeCode Available | 0 |
| ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions | Mar 6, 2025 | BenchmarkingHumanEval | CodeCode Available | 0 |
| HumanEval on Latest GPT Models -- 2024 | Feb 20, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 |