SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 264 papers

Title	Date	Tasks	Status	Hype	Score
Kotlin ML Pack: Technical Report	May 29, 2024	Code GenerationHumanEval	—Unverified	0	0
KV Prediction for Improved Time to First Token	Oct 10, 2024	Code CompletionCPU	—Unverified	0	0
Large Language Model Guided Self-Debugging Code Generation	Feb 5, 2025	Code GenerationComputational Efficiency	—Unverified	0	0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified	0	0
Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models	Feb 13, 2024	Code GenerationHumanEval	—Unverified	0	0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models	Dec 11, 2024	Domain GeneralizationGSM8K	—Unverified	0	0
Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs	Jan 14, 2025	Code GenerationHumanEval	—Unverified	0	0
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code	Mar 12, 2024	Code GenerationHumanEval	—Unverified	0	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0	0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing	Jun 17, 2025	ARCCoLA	—Unverified	0	0
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression	Sep 25, 2023	Code GenerationHumanEval	—Unverified	0	0
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation	Apr 17, 2024	Code GenerationHumanEval	—Unverified	0	0
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants	Jul 12, 2024	HumanEval	—Unverified	0	0
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding	Sep 9, 2024	Code GenerationHumanEval	—Unverified	0	0
Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis	May 5, 2025	ArticlesHumanEval	—Unverified	0	0
MojoBench: Language Modeling and Benchmarks for Mojo	Oct 23, 2024	Code GenerationHumanEval	—Unverified	0	0
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs	Jan 11, 2024	Code GenerationHumanEval	—Unverified	0	0
NExT: Teaching Large Language Models to Reason about Code Execution	Apr 23, 2024	HumanEvalmbpp	—Unverified	0	0
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness	Jan 29, 2024	HumanEval	—Unverified	0	0
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation	Apr 26, 2024	Code GenerationHumanEval	—Unverified	0	0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	Apr 5, 2025	Code GenerationHumanEval	—Unverified	0	0
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback	Jul 27, 2023	Code GenerationHumanEval	—Unverified	0	0
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion	Nov 13, 2023	Code CompletionHumanEval	—Unverified	0	0
PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation	Dec 17, 2024	Code GenerationHumanEval	—Unverified	0	0
Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?	Oct 26, 2022	HumanEvalLanguage Modelling	—Unverified	0	0

Show:10 25 50

← PrevPage 9 of 11Next →

No leaderboard results yet.