SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 181–190 of 264 papers

Title	Date	Tasks	Status	Hype
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding	Sep 9, 2024	Code GenerationHumanEval	—Unverified	0
Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis	May 5, 2025	ArticlesHumanEval	—Unverified	0
MojoBench: Language Modeling and Benchmarks for Mojo	Oct 23, 2024	Code GenerationHumanEval	—Unverified	0
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs	Jan 11, 2024	Code GenerationHumanEval	—Unverified	0
NExT: Teaching Large Language Models to Reason about Code Execution	Apr 23, 2024	HumanEvalmbpp	—Unverified	0
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness	Jan 29, 2024	HumanEval	—Unverified	0
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation	Apr 26, 2024	Code GenerationHumanEval	—Unverified	0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	Apr 5, 2025	Code GenerationHumanEval	—Unverified	0
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback	Jul 27, 2023	Code GenerationHumanEval	—Unverified	0
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion	Nov 13, 2023	Code CompletionHumanEval	—Unverified	0

Show:10 25 50

← PrevPage 19 of 27Next →

No leaderboard results yet.