SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–100 of 264 papers

Title	Date	Tasks	Status	Hype	Score
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data	Sep 5, 2024	Code GenerationDiversity	CodeCode Available	1	5
Better & Faster Large Language Models via Multi-token Prediction	Apr 30, 2024	HumanEvalmbpp	CodeCode Available	1	5
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models	Feb 24, 2024	HumanEvalMemorization	CodeCode Available	1	5
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models	Feb 23, 2025	Code GenerationHumanEval	CodeCode Available	1	5
ContraCLM: Contrastive Learning For Causal Language Model	Oct 3, 2022	Code GenerationCode Search	CodeCode Available	1	5
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1	5
ANPL: Towards Natural Programming with Interactive Decomposition	May 29, 2023	ARCCode Generation	CodeCode Available	1	5
Multi-lingual Evaluation of Code Generation Models	Oct 26, 2022	Code CompletionCode Generation	CodeCode Available	1	5
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark	Jun 10, 2024	HumanEvalProgram Synthesis	CodeCode Available	1	5
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing	Mar 10, 2025	Code GenerationHumanEval	CodeCode Available	1	5

Show:10 25 50

← PrevPage 10 of 27Next →

No leaderboard results yet.