| MasRouter: Learning to Route LLMs for Multi-Agent Systems | Feb 16, 2025 | HumanEvalmbpp | CodeCode Available | 2 |
| CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | Feb 8, 2025 | Code GenerationHumanEval | CodeCode Available | 2 |
| From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging | Oct 2, 2024 | Auto DebuggingBug fixing | CodeCode Available | 2 |
| Training Language Models to Self-Correct via Reinforcement Learning | Sep 19, 2024 | HumanEvalMath | CodeCode Available | 2 |
| A Survey on Large Language Models for Code Generation | Jun 1, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| MapCoder: Multi-Agent Code Generation for Competitive Problem Solving | May 18, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | May 7, 2024 | HumanEvalmbpp | CodeCode Available | 2 |
| Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM | Mar 28, 2024 | Code GenerationHumanEval | CodeCode Available | 2 |
| AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | Dec 20, 2023 | Code GenerationHumanEval | CodeCode Available | 2 |
| Rethinking Benchmark and Contamination for Language Models with Rephrased Samples | Nov 8, 2023 | HumanEvalMMLU | CodeCode Available | 2 |