HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 264 papers

Title	Date	Tasks	Status	Hype	Score
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking	May 20, 2025	HumanEvalmbpp	CodeCode Available	1	5
ContraCLM: Contrastive Learning For Causal Language Model	Oct 3, 2022	Code GenerationCode Search	CodeCode Available	1	5
Can Programming Languages Boost Each Other via Instruction Tuning?	Aug 31, 2023	HumanEval	CodeCode Available	0	5
AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation	Oct 1, 2024	Code GenerationHumanEval	CodeCode Available	0	5
Instruction Fusion: Advancing Prompt Evolution through Hybridization	Dec 25, 2023	Code GenerationHumanEval	CodeCode Available	0	5
HumanEval on Latest GPT Models -- 2024	Feb 20, 2024	Code GenerationHumanEval	CodeCode Available	0	5
Measuring the Influence of Incorrect Code on Test Generation	Sep 14, 2024	HumanEvalLarge Language Model	CodeCode Available	0	5
Large Language Models of Code Fail at Completing Code with Potential Bugs	Jun 6, 2023	Code CompletionHumanEval	CodeCode Available	0	5
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings	May 19, 2025	HumanEvalMath	CodeCode Available	0	5
Can Github issues be solved with Tree Of Thoughts?	May 20, 2024	Code GenerationGitHub issue resolution	CodeCode Available	0	5
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance	Oct 2, 2024	Code GenerationHumanEval	CodeCode Available	0	5
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models	Sep 27, 2023	HumanEvalLanguage Modeling	CodeCode Available	0	5
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system	Oct 28, 2024	Code GenerationHumanEval	CodeCode Available	0	5
Using Large Language Models to Generate JUnit Tests: An Empirical Study	Apr 30, 2023	Code GenerationHumanEval	CodeCode Available	0	5
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study	Mar 22, 2024	Code CompletionHumanEval	CodeCode Available	0	5
Evaluating How Fine-tuning on Bimodal Data Effects Code Generation	Nov 15, 2022	Code GenerationHumanEval	CodeCode Available	0	5
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models	Jan 15, 2024	HumanEvalLanguage Modelling	CodeCode Available	0	5
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency	Sep 29, 2023	Code GenerationHumanEval	CodeCode Available	0	5
Self-Correcting Code Generation Using Small Language Models	May 29, 2025	Code GenerationHumanEval	CodeCode Available	0	5
Self-Edit: Fault-Aware Code Editor for Code Generation	May 6, 2023	Code GenerationHumanEval	CodeCode Available	0	5
CoCoNUT: Structural Code Understanding does not fall out of a tree	Jan 27, 2025	Code GenerationHumanEval	CodeCode Available	0	5
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available	0	5
Multi-Programming Language Ensemble for Code Generation in Large Language Model	Sep 6, 2024	Code GenerationHumanEval	CodeCode Available	0	5
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective	Apr 11, 2024	Code GenerationHumanEval	CodeCode Available	0	5
CodeT5+: Open Code Large Language Models for Code Understanding and Generation	May 13, 2023	Arithmetic ReasoningCode Completion	CodeCode Available	0	5
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation	Nov 1, 2024	Code TranslationHumanEval	CodeCode Available	0	5
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0	5
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Oct 19, 2024	Code GenerationDiversity	CodeCode Available	0	5
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available	0	5
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	May 12, 2025	Code GenerationComment Generation	CodeCode Available	0	5
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0	5
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available	0	5
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available	0	5
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available	0	5
Software Vulnerability and Functionality Assessment using LLMs	Mar 13, 2024	Code GenerationHumanEval	—Unverified	0	0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0	0
Actor-Critic based Online Data Mixing For Language Model Pre-Training	May 29, 2025	HumanEvalLanguage Modeling	—Unverified	0	0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified	0	0
Addressing Data Leakage in HumanEval Using Combinatorial Test Design	Dec 2, 2024	HumanEval	—Unverified	0	0
AIME: AI System Optimization via Multiple LLM Evaluators	Oct 4, 2024	Code GenerationHumanEval	—Unverified	0	0
Aligning CodeLLMs with Direct Preference Optimization	Oct 24, 2024	Decision MakingHumanEval	—Unverified	0	0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement	Dec 9, 2024	Code GenerationHumanEval	—Unverified	0	0
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks	May 27, 2025	Code GenerationCode Summarization	—Unverified	0	0
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks	Nov 23, 2024	Code GenerationHumanEval	—Unverified	0	0
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement	Apr 29, 2025	Code GenerationHumanEval	—Unverified	0	0
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining	Sep 3, 2024	Code GenerationHumanEval	—Unverified	0	0
A Review of Repository Level Prompting for LLMs	Dec 15, 2023	Code CompletionCode Generation	—Unverified	0	0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge	Mar 13, 2024	Dialogue EvaluationHumanEval	—Unverified	0	0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection	May 12, 2025	GSM8KHumanEval	—Unverified	0	0

Show:10 25 50

← PrevPage 3 of 6Next →

No leaderboard results yet.