SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 264 papers

Title	Date	Tasks	Status	Hype
MasRouter: Learning to Route LLMs for Multi-Agent Systems	Feb 16, 2025	HumanEvalmbpp	CodeCode Available	2
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging	Feb 8, 2025	Code GenerationHumanEval	CodeCode Available	2
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging	Oct 2, 2024	Auto DebuggingBug fixing	CodeCode Available	2
Training Language Models to Self-Correct via Reinforcement Learning	Sep 19, 2024	HumanEvalMath	CodeCode Available	2
A Survey on Large Language Models for Code Generation	Jun 1, 2024	Code GenerationHumanEval	CodeCode Available	2
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving	May 18, 2024	Code GenerationHumanEval	CodeCode Available	2
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts	May 7, 2024	HumanEvalmbpp	CodeCode Available	2
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM	Mar 28, 2024	Code GenerationHumanEval	CodeCode Available	2
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation	Dec 20, 2023	Code GenerationHumanEval	CodeCode Available	2
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples	Nov 8, 2023	HumanEvalMMLU	CodeCode Available	2

Show:10 25 50

← PrevPage 4 of 27Next →

No leaderboard results yet.