SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 131–140 of 264 papers

Title	Date	Tasks	Status	Hype
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models	Mar 10, 2025	HumanEvalProgram Synthesis	—Unverified	0
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol	Mar 7, 2025	BenchmarkingBug fixing	—Unverified	0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified	0
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available	0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified	0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified	0
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified	0
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available	0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0
Large Language Model Guided Self-Debugging Code Generation	Feb 5, 2025	Code GenerationComputational Efficiency	—Unverified	0

Show:10 25 50

← PrevPage 14 of 27Next →

No leaderboard results yet.