| CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Aug 7, 2024 | HumanEvalmbpp | CodeCode Available | 7 | 5 |
| EvoAgentX: An Automated Framework for Evolving Agentic Workflows | Jul 4, 2025 | Code GenerationMath | CodeCode Available | 7 | 5 |
| Code Llama: Open Foundation Models for Code | Aug 24, 2023 | 16kCode Generation | CodeCode Available | 6 | 5 |
| WizardCoder: Empowering Code Large Language Models with Evol-Instruct | Jun 14, 2023 | Code GenerationHumanEval | CodeCode Available | 5 | 5 |
| OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | Feb 22, 2024 | Code GenerationHumanEval | CodeCode Available | 5 | 5 |
| Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | Feb 25, 2024 | Code GenerationHumanEval | CodeCode Available | 4 | 5 |
| Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | May 12, 2025 | Code Generation | CodeCode Available | 3 | 5 |
| KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding | Mar 4, 2025 | HumanEvalmbpp | CodeCode Available | 3 | 5 |
| DataDecide: How to Predict Best Pretraining Data with Small Experiments | Apr 15, 2025 | ARCHellaSwag | CodeCode Available | 3 | 5 |
| MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation | Aug 17, 2022 | BenchmarkingCode Generation | CodeCode Available | 2 | 5 |