SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 231–240 of 264 papers

Title	Date	Tasks	Status	Hype	Score
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks	Jan 20, 2025	Code GenerationHumanEval	—Unverified	0	0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified	0	0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0	0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0	0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation	Sep 15, 2024	Code GenerationHumanEval	—Unverified	0	0
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	Jun 25, 2025	Code GenerationHumanEval	—Unverified	0	0
Scattered Forest Search: Smarter Code Space Exploration with LLMs	Oct 22, 2024	Code GenerationDiversity	—Unverified	0	0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity	Dec 30, 2024	BenchmarkingCode Generation	—Unverified	0	0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity	Sep 24, 2024	Code GenerationContrastive Learning	—Unverified	0	0
SelfEvolve: A Code Evolution Framework via Large Language Models	Jun 5, 2023	Code GenerationHumanEval	—Unverified	0	0

Show:10 25 50

← PrevPage 24 of 27Next →

No leaderboard results yet.