SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 131–140 of 264 papers

Title	Date	Tasks	Status	Hype	Score
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0	5
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available	0	5
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0	5
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available	0	5
Software Vulnerability and Functionality Assessment using LLMs	Mar 13, 2024	Code GenerationHumanEval	—Unverified	0	0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0	0
Actor-Critic based Online Data Mixing For Language Model Pre-Training	May 29, 2025	HumanEvalLanguage Modeling	—Unverified	0	0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified	0	0
Addressing Data Leakage in HumanEval Using Combinatorial Test Design	Dec 2, 2024	HumanEval	—Unverified	0	0

Show:10 25 50

← PrevPage 14 of 27Next →

No leaderboard results yet.