SOTAVerified

HumanEval

Papers

Showing 151200 of 264 papers

TitleStatusHype
A Survey on Large Language Models for Code GenerationCode2
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code0
Kotlin ML Pack: Technical Report0
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code GenerationCode1
EffiLearner: Enhancing Efficiency of Generated Code via Self-OptimizationCode1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
Instruction Tuning With Loss Over InstructionsCode1
Can Github issues be solved with Tree Of Thoughts?Code0
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code GenerationCode1
MapCoder: Multi-Agent Code Generation for Competitive Problem SolvingCode2
RLHF Workflow: From Reward Modeling to Online RLHFCode5
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User PromptsCode2
Better & Faster Large Language Models via Multi-token PredictionCode1
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation0
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
BASS: Batched Attention-optimized Speculative Sampling0
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-ExpertsCode1
NExT: Teaching Large Language Models to Reason about Code Execution0
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation0
Comments as Natural Logic Pivots: Improve Code Generation via Comment PerspectiveCode0
The RealHumanEval: Evaluating Large Language Models' Abilities to Support ProgrammersCode1
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and OptimizationCode1
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation0
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLMCode2
CYCLE: Learning to Self-Refine the Code GenerationCode1
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?0
CodeShell Technical Report0
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents0
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case StudyCode0
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding PreferencesCode7
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge0
Software Vulnerability and Functionality Assessment using LLMs0
AutoDev: Automated AI-Driven DevelopmentCode11
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code0
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language ModelsCode1
LLM4Decompile: Decompiling Binary Code with Large Language ModelsCode9
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language GeneralizationCode1
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-stepCode4
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language ModelsCode1
OpenCodeInterpreter: Integrating Code Generation with Execution and RefinementCode5
Test-Driven Development for Code Generation0
HumanEval on Latest GPT Models -- 2024Code0
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct DecodingCode1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction TuningCode1
Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models0
Unsupervised Evaluation of Code LLMs with Round-Trip CorrectnessCode1
Getting the most out of your tokenizer for pre-training and domain adaptationCode1
Show:102550
← PrevPage 4 of 6Next →

No leaderboard results yet.