SOTAVerified

HumanEval

Papers

Showing 5175 of 264 papers

TitleStatusHype
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and DebuggingCode2
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Large Language Model Guided Self-Debugging Code Generation0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis0
Learning to Generate Unit Tests for Automated DebuggingCode1
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities0
How to Select Datapoints for Efficient Human Evaluation of NLG Models?Code1
CoCoNUT: Structural Code Understanding does not fall out of a treeCode0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks0
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought ThinkingCode1
Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs0
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks0
Dafny as Verification-Aware Intermediate Language for Code Generation0
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion0
Dynamic Scaling of Unit Tests for Code Reward Modeling0
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement0
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code GenerationCode1
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference0
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models0
PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement0
Does Few-Shot Learning Help LLM Performance in Code Synthesis?0
Show:102550
← PrevPage 3 of 11Next →

No leaderboard results yet.