SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 264 papers

Title	Date	Tasks	Status	Hype	Score
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation	Nov 1, 2024	Code TranslationHumanEval	CodeCode Available	0	5
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0	5
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Oct 19, 2024	Code GenerationDiversity	CodeCode Available	0	5
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available	0	5
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	May 12, 2025	Code GenerationComment Generation	CodeCode Available	0	5
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0	5
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available	0	5
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0	5
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available	0	5
Software Vulnerability and Functionality Assessment using LLMs	Mar 13, 2024	Code GenerationHumanEval	—Unverified	0	0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0	0
Actor-Critic based Online Data Mixing For Language Model Pre-Training	May 29, 2025	HumanEvalLanguage Modeling	—Unverified	0	0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified	0	0
Addressing Data Leakage in HumanEval Using Combinatorial Test Design	Dec 2, 2024	HumanEval	—Unverified	0	0
AIME: AI System Optimization via Multiple LLM Evaluators	Oct 4, 2024	Code GenerationHumanEval	—Unverified	0	0
Aligning CodeLLMs with Direct Preference Optimization	Oct 24, 2024	Decision MakingHumanEval	—Unverified	0	0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement	Dec 9, 2024	Code GenerationHumanEval	—Unverified	0	0
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks	May 27, 2025	Code GenerationCode Summarization	—Unverified	0	0
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks	Nov 23, 2024	Code GenerationHumanEval	—Unverified	0	0
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement	Apr 29, 2025	Code GenerationHumanEval	—Unverified	0	0
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining	Sep 3, 2024	Code GenerationHumanEval	—Unverified	0	0
A Review of Repository Level Prompting for LLMs	Dec 15, 2023	Code CompletionCode Generation	—Unverified	0	0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge	Mar 13, 2024	Dialogue EvaluationHumanEval	—Unverified	0	0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection	May 12, 2025	GSM8KHumanEval	—Unverified	0	0

Show:10 25 50

← PrevPage 6 of 11Next →

No leaderboard results yet.