| Test-Driven Development for Code Generation | Feb 21, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| HumanEval on Latest GPT Models -- 2024 | Feb 20, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | Jan 29, 2024 | HumanEval | —Unverified | 0 |
| A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models | Jan 15, 2024 | HumanEvalLanguage Modelling | CodeCode Available | 0 |
| Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs | Jan 11, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs | Jan 8, 2024 | Code GenerationDiversity | —Unverified | 0 |
| Instruction Fusion: Advancing Prompt Evolution through Hybridization | Dec 25, 2023 | Code GenerationHumanEval | CodeCode Available | 0 |
| A Review of Repository Level Prompting for LLMs | Dec 15, 2023 | Code CompletionCode Generation | —Unverified | 0 |
| Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data | Dec 5, 2023 | Code GenerationHumanEval | —Unverified | 0 |