HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 264 papers

Title	Date	Tasks	Status
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation	Sep 15, 2024	Code GenerationHumanEval	—Unverified
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	Jun 25, 2025	Code GenerationHumanEval	—Unverified
Scattered Forest Search: Smarter Code Space Exploration with LLMs	Oct 22, 2024	Code GenerationDiversity	—Unverified
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity	Dec 30, 2024	BenchmarkingCode Generation	—Unverified
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity	Sep 24, 2024	Code GenerationContrastive Learning	—Unverified
SelfEvolve: A Code Evolution Framework via Large Language Models	Jun 5, 2023	Code GenerationHumanEval	—Unverified
Self-Evolving Multi-Agent Collaboration Networks for Software Development	Oct 22, 2024	HumanEval	—Unverified
Self-Explained Keywords Empower Large Language Models for Code Generation	Oct 21, 2024	Code GenerationHumanEval	—Unverified
Semantic-guided Search for Efficient Program Repair with Large Language Models	Oct 22, 2024	GPUHumanEval	—Unverified
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models	Jul 30, 2024	BenchmarkingCode Completion	—Unverified
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths	May 30, 2024	GSM8KHumanEval	—Unverified
Stochastic Code Generation	Apr 14, 2023	Code GenerationDecoder	—Unverified
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified
SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation	May 30, 2025	Code GenerationHumanEval	—Unverified
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models	Nov 11, 2024	Code GenerationHumanEval	—Unverified
Test-Driven Development for Code Generation	Feb 21, 2024	Code GenerationHumanEval	—Unverified
Textbooks Are All You Need	Jun 20, 2023	AllCode Generation	—Unverified
The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models	May 5, 2025	HumanEvalProgram Repair	—Unverified
The Program Testing Ability of Large Language Models for Code	Oct 9, 2023	HumanEvalmbpp	—Unverified
The Stack: 3 TB of permissively licensed source code	Nov 20, 2022	HumanEvalmbpp	—Unverified
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement	Dec 30, 2024	Code GenerationHumanEval	—Unverified
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs	Aug 18, 2024	DiversityGPU	—Unverified
Towards Large Language Model Aided Program Refinement	Jun 26, 2024	HumanEvalLanguage Modeling	—Unverified
Turning the Tide: Repository-based Code Reflection	Jul 14, 2025	Code GenerationDiversity	—Unverified
Type-Constrained Code Generation with Language Models	Apr 12, 2025	Code GenerationHumanEval	—Unverified
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified
Validating LLM-Generated Programs with Metamorphic Prompt Testing	Jun 11, 2024	HumanEval	—Unverified
VALTEST: Automated Validation of Language Model Generated Test Cases	Nov 13, 2024	HumanEvalLanguage Modeling	—Unverified
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents	Mar 23, 2024	Code GenerationHumanEval	—Unverified
Large Language Models Meet NL2Code: A Survey	Dec 19, 2022	HumanEvalSurvey	—Unverified
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models	Jan 15, 2024	HumanEvalLanguage Modelling	CodeCode Available
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	May 12, 2025	Code GenerationComment Generation	CodeCode Available
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation	Oct 28, 2023	Code GenerationHumanEval	CodeCode Available
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models	Jun 10, 2024	BenchmarkingCode Generation	CodeCode Available
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available
Multi-Programming Language Ensemble for Code Generation in Large Language Model	Sep 6, 2024	Code GenerationHumanEval	CodeCode Available
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation	Oct 19, 2024	Code GenerationDiversity	CodeCode Available
Large Language Models of Code Fail at Completing Code with Potential Bugs	Jun 6, 2023	Code CompletionHumanEval	CodeCode Available
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models	Sep 27, 2023	HumanEvalLanguage Modeling	CodeCode Available
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study	Mar 22, 2024	Code CompletionHumanEval	CodeCode Available
Measuring the Influence of Incorrect Code on Test Generation	Sep 14, 2024	HumanEvalLarge Language Model	CodeCode Available
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation	Nov 1, 2024	Code TranslationHumanEval	CodeCode Available
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality	Feb 13, 2025	8kGPU	CodeCode Available
Instruction Fusion: Advancing Prompt Evolution through Hybridization	Dec 25, 2023	Code GenerationHumanEval	CodeCode Available
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance	Oct 2, 2024	Code GenerationHumanEval	CodeCode Available
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available
HumanEval on Latest GPT Models -- 2024	Feb 20, 2024	Code GenerationHumanEval	CodeCode Available
CodeT5+: Open Code Large Language Models for Code Understanding and Generation	May 13, 2023	Arithmetic ReasoningCode Completion	CodeCode Available

Show:10 25 50

← PrevPage 5 of 6Next →

No leaderboard results yet.