SOTAVerified

HumanEval

Papers

Showing 2650 of 264 papers

TitleStatusHype
OctoPack: Instruction Tuning Code Large Language ModelsCode3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationCode3
Evaluating Large Language Models Trained on CodeCode3
any4: Learned 4-bit Numeric Representation for LLMsCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
MasRouter: Learning to Route LLMs for Multi-Agent SystemsCode2
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and DebuggingCode2
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical DebuggingCode2
Training Language Models to Self-Correct via Reinforcement LearningCode2
A Survey on Large Language Models for Code GenerationCode2
MapCoder: Multi-Agent Code Generation for Competitive Problem SolvingCode2
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User PromptsCode2
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLMCode2
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and OptimisationCode2
Rethinking Benchmark and Contamination for Language Models with Rephrased SamplesCode2
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsCode2
Parsel: Algorithmic Reasoning with Language Models by Composing DecompositionsCode2
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code GenerationCode2
CodeT: Code Generation with Generated TestsCode2
Rethinking Verification for LLM Code Generation: From Generation to TestingCode1
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM WatermarkingCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
Rethinking Repetition Problems of LLMs in Code GenerationCode1
Rewriting Pre-Training Data Boosts LLM Performance in Math and CodeCode1
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox TestingCode1
Show:102550
← PrevPage 2 of 11Next →

No leaderboard results yet.