SOTAVerified

HumanEval

Papers

Showing 6170 of 264 papers

TitleStatusHype
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language ModelsCode1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language ModelsCode1
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code GenerationCode1
Fault-Aware Neural Code RankersCode1
ArchCode: Incorporating Software Requirements in Code Generation with Large Language ModelsCode1
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct DecodingCode1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Code1
Learning to Generate Unit Tests for Automated DebuggingCode1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction TuningCode1
Show:102550
← PrevPage 7 of 27Next →

No leaderboard results yet.