SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 264 papers

Title	Date	Tasks	Status	Hype
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation	Apr 1, 2024	Code GenerationHallucination	—Unverified	0
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM	Mar 28, 2024	Code GenerationHumanEval	CodeCode Available	2
CYCLE: Learning to Self-Refine the Code Generation	Mar 27, 2024	Code GenerationHumanEval	CodeCode Available	1
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?	Mar 25, 2024	HumanEval	—Unverified	0
CodeShell Technical Report	Mar 23, 2024	8kHumanEval	—Unverified	0
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents	Mar 23, 2024	Code GenerationHumanEval	—Unverified	0
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study	Mar 22, 2024	Code CompletionHumanEval	CodeCode Available	0
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences	Mar 14, 2024	HumanEval	CodeCode Available	7
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge	Mar 13, 2024	Dialogue EvaluationHumanEval	—Unverified	0
Software Vulnerability and Functionality Assessment using LLMs	Mar 13, 2024	Code GenerationHumanEval	—Unverified	0
AutoDev: Automated AI-Driven Development	Mar 13, 2024	Code GenerationHumanEval	CodeCode Available	11
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code	Mar 12, 2024	Code GenerationHumanEval	—Unverified	0
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models	Mar 11, 2024	Code GenerationHumanEval	CodeCode Available	1
LLM4Decompile: Decompiling Binary Code with Large Language Models	Mar 8, 2024	HumanEval	CodeCode Available	9
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization	Feb 26, 2024	Code GenerationHumanEval	CodeCode Available	1
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step	Feb 25, 2024	Code GenerationHumanEval	CodeCode Available	4
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models	Feb 24, 2024	HumanEvalMemorization	CodeCode Available	1
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement	Feb 22, 2024	Code GenerationHumanEval	CodeCode Available	5
Test-Driven Development for Code Generation	Feb 21, 2024	Code GenerationHumanEval	—Unverified	0
HumanEval on Latest GPT Models -- 2024	Feb 20, 2024	Code GenerationHumanEval	CodeCode Available	0
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding	Feb 19, 2024	HumanEvalLanguage Modeling	CodeCode Available	1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning	Feb 14, 2024	Code GenerationHumanEval	CodeCode Available	1
Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models	Feb 13, 2024	Code GenerationHumanEval	—Unverified	0
Unsupervised Evaluation of Code LLMs with Round-Trip Correctness	Feb 13, 2024	HumanEvalmbpp	CodeCode Available	1
Getting the most out of your tokenizer for pre-training and domain adaptation	Feb 1, 2024	Code GenerationDomain Adaptation	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 11Next →

No leaderboard results yet.