| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 |
| Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | Apr 17, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective | Apr 11, 2024 | Code GenerationHumanEval | CodeCode Available | 0 |
| The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers | Apr 3, 2024 | HumanEval | CodeCode Available | 1 |
| Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | Apr 2, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| Exploring and Evaluating Hallucinations in LLM-Powered Code Generation | Apr 1, 2024 | Code GenerationHallucination | —Unverified | 0 |
| Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM | Mar 28, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| CYCLE: Learning to Self-Refine the Code Generation | Mar 27, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| Reasoning Runtime Behavior of a Program with LLM: How Far Are We? | Mar 25, 2024 | HumanEval | —Unverified | 0 |
| CodeShell Technical Report | Mar 23, 2024 | 8kHumanEval | —Unverified | 0 |