HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 264 papers

Title	Date	Tasks	Status
Type-Constrained Code Generation with Language Models	Apr 12, 2025	Code GenerationHumanEval	—Unverified
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified
VALTEST: Automated Validation of Language Model Generated Test Cases	Nov 13, 2024	HumanEvalLanguage Modeling	—Unverified
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents	Mar 23, 2024	Code GenerationHumanEval	—Unverified
Large Language Models Meet NL2Code: A Survey	Dec 19, 2022	HumanEvalSurvey	—Unverified
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models	Jan 15, 2024	HumanEvalLanguage Modelling	CodeCode Available
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	May 12, 2025	Code GenerationComment Generation	CodeCode Available
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available
Multi-Programming Language Ensemble for Code Generation in Large Language Model	Sep 6, 2024	Code GenerationHumanEval	CodeCode Available
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Oct 19, 2024	Code GenerationDiversity	CodeCode Available
Large Language Models of Code Fail at Completing Code with Potential Bugs	Jun 6, 2023	Code CompletionHumanEval	CodeCode Available
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models	Sep 27, 2023	HumanEvalLanguage Modeling	CodeCode Available
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study	Mar 22, 2024	Code CompletionHumanEval	CodeCode Available
Measuring the Influence of Incorrect Code on Test Generation	Sep 14, 2024	HumanEvalLarge Language Model	CodeCode Available
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation	Nov 1, 2024	Code TranslationHumanEval	CodeCode Available
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available
Instruction Fusion: Advancing Prompt Evolution through Hybridization	Dec 25, 2023	Code GenerationHumanEval	CodeCode Available
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance	Oct 2, 2024	Code GenerationHumanEval	CodeCode Available
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available
HumanEval on Latest GPT Models -- 2024	Feb 20, 2024	Code GenerationHumanEval	CodeCode Available
CodeT5+: Open Code Large Language Models for Code Understanding and Generation	May 13, 2023	Arithmetic ReasoningCode Completion	CodeCode Available

Show:10 25 50

← PrevPage 10 of 11Next →

No leaderboard results yet.