SOTAVerified|Agents Browse Leaderboard About Blog

HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 264 papers

Title	Date	Tasks	Status	Hype
CodeMirage: Hallucinations in Code Generated by Large Language Models	Aug 14, 2024	Code GenerationHallucination	—Unverified	0
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding	Aug 8, 2024	HumanEvalRetrieval	—Unverified	0
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases	Aug 7, 2024	HumanEvalmbpp	CodeCode Available	7
ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models	Aug 2, 2024	Code GenerationHumanEval	CodeCode Available	1
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models	Jul 30, 2024	BenchmarkingCode Completion	—Unverified	0
Discrete Flow Matching	Jul 22, 2024	HumanEvalmbpp	—Unverified	0
Scaling Granite Code Models to 128K Context	Jul 18, 2024	2k4k	CodeCode Available	4
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants	Jul 12, 2024	HumanEval	—Unverified	0
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct	Jul 8, 2024	Code GenerationCode Summarization	CodeCode Available	1
Brevity is the soul of wit: Pruning long files for code generation	Jun 29, 2024	Code GenerationHumanEval	—Unverified	0
Towards Large Language Model Aided Program Refinement	Jun 26, 2024	HumanEvalLanguage Modeling	—Unverified	0
RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale	Jun 24, 2024	Code GenerationHumanEval	CodeCode Available	1
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models	Jun 20, 2024	Code GenerationHumanEval	—Unverified	0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency	Jun 18, 2024	HumanEvalmbpp	—Unverified	0
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified	0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases	Jun 11, 2024	Code GenerationHumanEval	—Unverified	0
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified	0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark	Jun 10, 2024	HumanEvalProgram Synthesis	CodeCode Available	1
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified	0
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning	Jun 3, 2024	Code CompletionCode Generation	CodeCode Available	1
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3

Show:10 25 50

← PrevPage 6 of 11Next →

No leaderboard results yet.