HumanEval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 264 papers

Title	Date	Tasks	Status
AutoTest: Evolutionary Code Solution Selection with Test Cases	Aug 22, 2024	Code GenerationHumanEval	—Unverified
BASS: Batched Attention-optimized Speculative Sampling	Apr 24, 2024	GPUHumanEval	—Unverified
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol	Mar 7, 2025	BenchmarkingBug fixing	—Unverified
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs	Jan 8, 2024	Code GenerationDiversity	—Unverified
Brevity is the soul of wit: Pruning long files for code generation	Jun 29, 2024	Code GenerationHumanEval	—Unverified
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation	Oct 16, 2023	Code GenerationHumanEval	—Unverified
Can LLMs Enable Verification in Mainstream Programming?	Mar 18, 2025	Code GenerationHumanEval	—Unverified
CELI: Controller-Embedded Language Model Interactions	Oct 18, 2024	ArticlesCode Generation	—Unverified
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation	Aug 17, 2023	Code GenerationFew-Shot Learning	—Unverified
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model	Oct 10, 2023	Code GenerationCode Translation	—Unverified
CodeMirage: Hallucinations in Code Generated by Large Language Models	Aug 14, 2024	Code GenerationHallucination	—Unverified
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts	May 8, 2025	Code CompletionCode Generation	—Unverified
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency	Jun 18, 2024	HumanEvalmbpp	—Unverified
CodeShell Technical Report	Mar 23, 2024	8kHumanEval	—Unverified
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models	Nov 7, 2024	Code GenerationDecision Making	—Unverified
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting	Aug 18, 2024	HumanEvalMathematical Reasoning	—Unverified
Context-Augmented Code Generation Using Programming Knowledge Graphs	Oct 9, 2024	Code GenerationHumanEval	—Unverified
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks	Sep 13, 2024	ARCCode Generation	—Unverified
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding	Aug 8, 2024	HumanEvalRetrieval	—Unverified
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution	Aug 23, 2024	Code GenerationHumanEval	—Unverified
Dafny as Verification-Aware Intermediate Language for Code Generation	Jan 10, 2025	Code GenerationHumanEval	—Unverified
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data	Dec 5, 2023	Code GenerationHumanEval	—Unverified
Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models	Oct 30, 2024	Code GenerationHumanEval	—Unverified
Discrete Flow Matching	Jul 22, 2024	HumanEvalmbpp	—Unverified
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation	May 30, 2024	Code GenerationHumanEval	—Unverified
Does Few-Shot Learning Help LLM Performance in Code Synthesis?	Dec 3, 2024	Code GenerationFew-Shot Learning	—Unverified
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation	Aug 23, 2024	Code GenerationHumanEval	—Unverified
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference	Dec 25, 2024	CPUGPU	—Unverified
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs	Nov 20, 2024	Code GenerationHumanEval	—Unverified
Dynamic Scaling of Unit Tests for Code Reward Modeling	Jan 2, 2025	Code GenerationHumanEval	—Unverified
Structured Chain-of-Thought Prompting for Code Generation	May 11, 2023	Code GenerationHumanEval	—Unverified
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach	May 29, 2025	Code GenerationHumanEval	—Unverified
Evaluating Large Language Models for Code Review	May 26, 2025	HumanEval	—Unverified
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?	Mar 25, 2024	HumanEval	—Unverified
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation	Apr 1, 2024	Code GenerationHallucination	—Unverified
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree	Dec 17, 2024	GSM8KHumanEval	—Unverified
From Output to Evaluation: Does Raw Instruction-Tuned Code LLMs Output Suffice for Fill-in-the-Middle Code Generation?	May 24, 2025	Code GenerationHumanEval	—Unverified
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models	Mar 10, 2025	HumanEvalProgram Synthesis	—Unverified
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks	Oct 15, 2024	HumanEvalLanguage Modelling	—Unverified
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified
GRIN: GRadient-INformed MoE	Sep 18, 2024	HellaSwagHumanEval	—Unverified
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees	Jun 17, 2025	Code TranslationHumanEval	—Unverified
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks	Jan 11, 2025	Code GenerationHumanEval	—Unverified
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities	Jan 31, 2025	Code GenerationHallucination	—Unverified
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models	Dec 18, 2024	HumanEvalImitation Learning	—Unverified
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion	Jan 6, 2025	GSM8KHumanEval	—Unverified
Interactive Code Generation via Test-Driven User-Intent Formalization	Aug 11, 2022	Code GenerationHumanEval	—Unverified
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified

Show:10 25 50

← PrevPage 4 of 6Next →

No leaderboard results yet.