SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 71–80 of 264 papers

Title	Date	Tasks	Status	Hype
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data	Sep 5, 2024	Code GenerationDiversity	CodeCode Available	1
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation	Dec 30, 2024	Code GenerationHumanEval	CodeCode Available	1
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking	May 20, 2025	HumanEvalmbpp	CodeCode Available	1
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark	Jun 10, 2024	HumanEvalProgram Synthesis	CodeCode Available	1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'	Oct 29, 2024	Code CompletionCode Generation	CodeCode Available	1
Is Self-Repair a Silver Bullet for Code Generation?	Jun 16, 2023	Code GenerationHumanEval	CodeCode Available	1
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization	Feb 26, 2024	Code GenerationHumanEval	CodeCode Available	1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models	Mar 11, 2024	Code GenerationHumanEval	CodeCode Available	1
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models	Feb 24, 2024	HumanEvalMemorization	CodeCode Available	1
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding	Feb 19, 2024	HumanEvalLanguage Modeling	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 27Next →

No leaderboard results yet.