SOTAVerified|Agents Browse Leaderboard About

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 264 papers

Title	Date	Tasks	Status	Hype
Discrete Flow Matching	Jul 22, 2024	HumanEvalmbpp	—Unverified	0
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants	Jul 12, 2024	HumanEval	—Unverified	0
Brevity is the soul of wit: Pruning long files for code generation	Jun 29, 2024	Code GenerationHumanEval	—Unverified	0
Towards Large Language Model Aided Program Refinement	Jun 26, 2024	HumanEvalLanguage Modeling	—Unverified	0
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models	Jun 20, 2024	Code GenerationHumanEval	—Unverified	0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency	Jun 18, 2024	HumanEvalmbpp	—Unverified	0
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified	0
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified	0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases	Jun 11, 2024	Code GenerationHumanEval	—Unverified	0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified	0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths	May 30, 2024	GSM8KHumanEval	—Unverified	0
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation	May 30, 2024	Code GenerationHumanEval	—Unverified	0
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code	May 29, 2024	HumanEval	—Unverified	0
Kotlin ML Pack: Technical Report	May 29, 2024	Code GenerationHumanEval	—Unverified	0
Can Github issues be solved with Tree Of Thoughts?	May 20, 2024	Code GenerationGitHub issue resolution	CodeCode Available	0
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation	Apr 26, 2024	Code GenerationHumanEval	—Unverified	0
BASS: Batched Attention-optimized Speculative Sampling	Apr 24, 2024	GPUHumanEval	—Unverified	0
NExT: Teaching Large Language Models to Reason about Code Execution	Apr 23, 2024	HumanEvalmbpp	—Unverified	0
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation	Apr 17, 2024	Code GenerationHumanEval	—Unverified	0
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective	Apr 11, 2024	Code GenerationHumanEval	CodeCode Available	0
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation	Apr 1, 2024	Code GenerationHallucination	—Unverified	0
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?	Mar 25, 2024	HumanEval	—Unverified	0
CodeShell Technical Report	Mar 23, 2024	8kHumanEval	—Unverified	0

Show:10 25 50

← PrevPage 9 of 11Next →

No leaderboard results yet.