SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 264 papers

Title	Date	Tasks	Status	Hype
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging	Feb 8, 2025	Code GenerationHumanEval	CodeCode Available	2
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0
Large Language Model Guided Self-Debugging Code Generation	Feb 5, 2025	Code GenerationComputational Efficiency	—Unverified	0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0
Learning to Generate Unit Tests for Automated Debugging	Feb 3, 2025	HumanEvalLarge Language Model	CodeCode Available	1
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities	Jan 31, 2025	Code GenerationHallucination	—Unverified	0
How to Select Datapoints for Efficient Human Evaluation of NLG Models?	Jan 30, 2025	HumanEvalMachine Translation	CodeCode Available	1
CoCoNUT: Structural Code Understanding does not fall out of a tree	Jan 27, 2025	Code GenerationHumanEval	CodeCode Available	0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks	Jan 20, 2025	Code GenerationHumanEval	—Unverified	0
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking	Jan 20, 2025	Decision MakingGSM8K	CodeCode Available	1

Show:10 25 50

← PrevPage 6 of 27Next →

No leaderboard results yet.