SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 264 papers

Title	Date	Tasks	Status	Hype
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models	Oct 6, 2023	Code GenerationDecision Making	CodeCode Available	2
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions	Dec 20, 2022	Automated Theorem ProvingCode Generation	CodeCode Available	2
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation	Aug 17, 2022	BenchmarkingCode Generation	CodeCode Available	2
CodeT: Code Generation with Generated Tests	Jul 21, 2022	Code GenerationHumanEval	CodeCode Available	2
Rethinking Verification for LLM Code Generation: From Generation to Testing	Jul 9, 2025	Code GenerationHumanEval	CodeCode Available	1
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking	May 20, 2025	HumanEvalmbpp	CodeCode Available	1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems	May 17, 2025	Arithmetic ReasoningCode Generation	CodeCode Available	1
Rethinking Repetition Problems of LLMs in Code Generation	May 15, 2025	Code GenerationHumanEval	CodeCode Available	1
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code	May 5, 2025	Code GenerationGSM8K	CodeCode Available	1
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing	Mar 10, 2025	Code GenerationHumanEval	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 27Next →

No leaderboard results yet.