SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 264 papers

Title	Date	Tasks	Status	Hype	Score
Does Few-Shot Learning Help LLM Performance in Code Synthesis?	Dec 3, 2024	Code GenerationFew-Shot Learning	—Unverified	0	0
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified	0	0
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation	Aug 23, 2024	Code GenerationHumanEval	—Unverified	0	0
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Dec 25, 2024	CPUGPU	—Unverified	0	0
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs	Nov 20, 2024	Code GenerationHumanEval	—Unverified	0	0
Dynamic Scaling of Unit Tests for Code Reward Modeling	Jan 2, 2025	Code GenerationHumanEval	—Unverified	0	0
Structured Chain-of-Thought Prompting for Code Generation	May 11, 2023	Code GenerationHumanEval	—Unverified	0	0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach	May 29, 2025	Code GenerationHumanEval	—Unverified	0	0
Evaluating Large Language Models for Code Review	May 26, 2025	HumanEval	—Unverified	0	0
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?	Mar 25, 2024	HumanEval	—Unverified	0	0
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation	Apr 1, 2024	Code GenerationHallucination	—Unverified	0	0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree	Dec 17, 2024	GSM8KHumanEval	—Unverified	0	0
From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation?	May 24, 2025	Code GenerationHumanEval	—Unverified	0	0
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models	Mar 10, 2025	HumanEvalProgram Synthesis	—Unverified	0	0
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks	Oct 15, 2024	HumanEvalLanguage Modelling	—Unverified	0	0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified	0	0
GRIN: GRadient-INformed MoE	Sep 18, 2024	HellaSwagHumanEval	—Unverified	0	0
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees	Jun 17, 2025	Code TranslationHumanEval	—Unverified	0	0
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks	Jan 11, 2025	Code GenerationHumanEval	—Unverified	0	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0	0
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities	Jan 31, 2025	Code GenerationHallucination	—Unverified	0	0
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models	Dec 18, 2024	HumanEvalImitation Learning	—Unverified	0	0
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion	Jan 6, 2025	GSM8KHumanEval	—Unverified	0	0
Interactive Code Generation via Test-Driven User-Intent Formalization	Aug 11, 2022	Code GenerationHumanEval	—Unverified	0	0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified	0	0

Show:10 25 50

← PrevPage 8 of 11Next →

No leaderboard results yet.