SOTAVerified

HumanEval

Papers

Showing 125 of 264 papers

TitleStatusHype
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
Qwen2 Technical ReportCode13
AutoDev: Automated AI-Driven DevelopmentCode11
LLM4Decompile: Decompiling Binary Code with Large Language ModelsCode9
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph DatabasesCode7
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding PreferencesCode7
Code Llama: Open Foundation Models for CodeCode6
CodeGen: An Open Large Language Model for Code with Multi-Turn Program SynthesisCode6
RLHF Workflow: From Reward Modeling to Online RLHFCode5
OpenCodeInterpreter: Integrating Code Generation with Execution and RefinementCode5
StarCoder: may the source be with you!Code5
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-XCode5
WizardCoder: Empowering Code Large Language Models with Evol-InstructCode5
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-stepCode4
Scaling Granite Code Models to 128K ContextCode4
Baichuan 2: Open Large-scale Language ModelsCode4
CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionCode4
Magicoder: Empowering Code Generation with OSS-InstructCode4
Reflexion: Language Agents with Verbal Reinforcement LearningCode4
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Automatic Instruction Evolving for Large Language ModelsCode3
OctoPack: Instruction Tuning Code Large Language ModelsCode3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationCode3
Evaluating Large Language Models Trained on CodeCode3
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingCode3
Show:102550
← PrevPage 1 of 11Next →

No leaderboard results yet.