SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 264 papers

Title	Date	Tasks	Status	Hype
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement	Apr 29, 2025	Code GenerationHumanEval	—Unverified	0
DataDecide: How to Predict Best Pretraining Data with Small Experiments	Apr 15, 2025	ARCHellaSwag	CodeCode Available	3
Type-Constrained Code Generation with Language Models	Apr 12, 2025	Code GenerationHumanEval	—Unverified	0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	Apr 5, 2025	Code GenerationHumanEval	—Unverified	0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified	0
Can LLMs Enable Verification in Mainstream Programming?	Mar 18, 2025	Code GenerationHumanEval	—Unverified	0
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models	Mar 10, 2025	HumanEvalProgram Synthesis	—Unverified	0
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing	Mar 10, 2025	Code GenerationHumanEval	CodeCode Available	1
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol	Mar 7, 2025	BenchmarkingBug fixing	—Unverified	0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified	0

Show:10 25 50

← PrevPage 4 of 27Next →

No leaderboard results yet.