SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 121–130 of 264 papers

Title	Date	Tasks	Status	Hype	Score
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency	Sep 29, 2023	Code GenerationHumanEval	CodeCode Available	0	5
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	May 12, 2025	Code GenerationComment Generation	CodeCode Available	0	5
CoCoNUT: Structural Code Understanding does not fall out of a tree	Jan 27, 2025	Code GenerationHumanEval	CodeCode Available	0	5
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0	5
Can Programming Languages Boost Each Other via Instruction Tuning?	Aug 31, 2023	HumanEval	CodeCode Available	0	5
Multi-Programming Language Ensemble for Code Generation in Large Language Model	Sep 6, 2024	Code GenerationHumanEval	CodeCode Available	0	5
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
Can Github issues be solved with Tree Of Thoughts?	May 20, 2024	Code GenerationGitHub issue resolution	CodeCode Available	0	5
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0	5
Large Language Models of Code Fail at Completing Code with Potential Bugs	Jun 6, 2023	Code CompletionHumanEval	CodeCode Available	0	5

Show:10 25 50

← PrevPage 13 of 27Next →

No leaderboard results yet.