SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 264 papers

Title	Date	Tasks	Status	Hype
Addressing Data Leakage in HumanEval Using Combinatorial Test Design	Dec 2, 2024	HumanEval	—Unverified	0
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available	0
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks	Nov 23, 2024	Code GenerationHumanEval	—Unverified	0
Planning-Driven Programming: A Large Language Model Programming Workflow	Nov 21, 2024	Code GenerationHumanEval	CodeCode Available	1
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs	Nov 20, 2024	Code GenerationHumanEval	—Unverified	0
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback	Nov 18, 2024	HumanEvalmbpp	CodeCode Available	1
VALTEST: Automated Validation of Language Model Generated Test Cases	Nov 13, 2024	HumanEvalLanguage Modeling	—Unverified	0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models	Nov 11, 2024	Code GenerationHumanEval	—Unverified	0
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models	Nov 7, 2024	Code GenerationDecision Making	—Unverified	0
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation	Nov 1, 2024	Code TranslationHumanEval	CodeCode Available	0
SelfCodeAlign: Self-Alignment for Code Generation	Oct 31, 2024	Code GenerationHumanEval	CodeCode Available	3
Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models	Oct 30, 2024	Code GenerationHumanEval	—Unverified	0
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'	Oct 29, 2024	Code CompletionCode Generation	CodeCode Available	1
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system	Oct 28, 2024	Code GenerationHumanEval	CodeCode Available	0
Aligning CodeLLMs with Direct Preference Optimization	Oct 24, 2024	Decision MakingHumanEval	—Unverified	0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified	0
MojoBench: Language Modeling and Benchmarks for Mojo	Oct 23, 2024	Code GenerationHumanEval	—Unverified	0
Scattered Forest Search: Smarter Code Space Exploration with LLMs	Oct 22, 2024	Code GenerationDiversity	—Unverified	0
Self-Evolving Multi-Agent Collaboration Networks for Software Development	Oct 22, 2024	HumanEval	—Unverified	0
Semantic-guided Search for Efficient Program Repair with Large Language Models	Oct 22, 2024	GPUHumanEval	—Unverified	0
Self-Explained Keywords Empower Large Language Models for Code Generation	Oct 21, 2024	Code GenerationHumanEval	—Unverified	0
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Oct 19, 2024	Code GenerationDiversity	CodeCode Available	0
CELI: Controller-Embedded Language Model Interactions	Oct 18, 2024	ArticlesCode Generation	—Unverified	0
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks	Oct 16, 2024	Code GenerationHumanEval	CodeCode Available	1
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks	Oct 15, 2024	HumanEvalLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 4 of 11Next →

No leaderboard results yet.