SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 264 papers

Title	Date	Tasks	Status	Hype	Score
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	May 12, 2025	Code Generation	CodeCode Available	3	5
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3	5
SelfCodeAlign: Self-Alignment for Code Generation	Oct 31, 2024	Code GenerationHumanEval	CodeCode Available	3	5
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding	Apr 25, 2024	GSM8KHellaSwag	CodeCode Available	3	5
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation	May 2, 2023	Code GenerationHumanEval	CodeCode Available	3	5
OctoPack: Instruction Tuning Code Large Language Models	Aug 14, 2023	Code GenerationCode Repair	CodeCode Available	3	5
Evaluating Large Language Models Trained on Code	Jul 7, 2021	Code GenerationHumanEval	CodeCode Available	3	5
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding	Mar 4, 2025	HumanEvalmbpp	CodeCode Available	3	5
CodeT: Code Generation with Generated Tests	Jul 21, 2022	Code GenerationHumanEval	CodeCode Available	2	5
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging	Feb 8, 2025	Code GenerationHumanEval	CodeCode Available	2	5

Show:10 25 50

← PrevPage 3 of 27Next →

No leaderboard results yet.