SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 264 papers

Title	Date	Tasks	Status	Hype
DataDecide: How to Predict Best Pretraining Data with Small Experiments	Apr 15, 2025	ARCHellaSwag	CodeCode Available	3
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding	Mar 4, 2025	HumanEvalmbpp	CodeCode Available	3
SelfCodeAlign: Self-Alignment for Code Generation	Oct 31, 2024	Code GenerationHumanEval	CodeCode Available	3
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding	Apr 25, 2024	GSM8KHellaSwag	CodeCode Available	3
OctoPack: Instruction Tuning Code Large Language Models	Aug 14, 2023	Code GenerationCode Repair	CodeCode Available	3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation	May 2, 2023	Code GenerationHumanEval	CodeCode Available	3
Evaluating Large Language Models Trained on Code	Jul 7, 2021	Code GenerationHumanEval	CodeCode Available	3
any4: Learned 4-bit Numeric Representation for LLMs	Jul 7, 2025	GPUGSM8K	CodeCode Available	2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation	Feb 26, 2025	Code GenerationHumanEval	CodeCode Available	2

Show:10 25 50

← PrevPage 3 of 27Next →

No leaderboard results yet.