SOTAVerified

HumanEval

Papers

Showing 151200 of 264 papers

TitleStatusHype
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement0
Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference0
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models0
PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement0
Does Few-Shot Learning Help LLM Performance in Code Synthesis?0
Addressing Data Leakage in HumanEval Using Combinatorial Test Design0
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect VerifiersCode0
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks0
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs0
VALTEST: Automated Validation of Language Model Generated Test Cases0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models0
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models0
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code TranslationCode0
Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models0
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization systemCode0
Aligning CodeLLMs with Direct Preference Optimization0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment0
MojoBench: Language Modeling and Benchmarks for Mojo0
Self-Evolving Multi-Agent Collaboration Networks for Software Development0
Scattered Forest Search: Smarter Code Space Exploration with LLMs0
Semantic-guided Search for Efficient Program Repair with Large Language Models0
Self-Explained Keywords Empower Large Language Models for Code Generation0
mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code GenerationCode0
CELI: Controller-Embedded Language Model Interactions0
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
KV Prediction for Improved Time to First Token0
Context-Augmented Code Generation Using Programming Knowledge Graphs0
AIME: AI System Optimization via Multiple LLM Evaluators0
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation GuidanceCode0
AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code GenerationCode0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity0
GRIN: GRadient-INformed MoE0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation0
Measuring the Influence of Incorrect Code on Test GenerationCode0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding0
Multi-Programming Language Ensemble for Code Generation in Large Language ModelCode0
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining0
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution0
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation0
AutoTest: Evolutionary Code Solution Selection with Test Cases0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs0
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting0
CodeMirage: Hallucinations in Code Generated by Large Language Models0
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
Show:102550
← PrevPage 4 of 6Next →

No leaderboard results yet.