SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 264 papers

Title	Date	Tasks	Status	Hype
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available	0
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding	Mar 4, 2025	HumanEvalmbpp	CodeCode Available	3
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified	0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified	0
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation	Feb 26, 2025	Code GenerationHumanEval	CodeCode Available	2
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models	Feb 23, 2025	Code GenerationHumanEval	CodeCode Available	1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified	0
MasRouter: Learning to Route LLMs for Multi-Agent Systems	Feb 16, 2025	HumanEvalmbpp	CodeCode Available	2
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available	0

Show:10 25 50

← PrevPage 5 of 27Next →

No leaderboard results yet.