SOTAVerified

HumanEval

Papers

Showing 125 of 264 papers

TitleStatusHype
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
Qwen2 Technical ReportCode13
AutoDev: Automated AI-Driven DevelopmentCode11
LLM4Decompile: Decompiling Binary Code with Large Language ModelsCode9
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph DatabasesCode7
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding PreferencesCode7
Code Llama: Open Foundation Models for CodeCode6
CodeGen: An Open Large Language Model for Code with Multi-Turn Program SynthesisCode6
RLHF Workflow: From Reward Modeling to Online RLHFCode5
OpenCodeInterpreter: Integrating Code Generation with Execution and RefinementCode5
WizardCoder: Empowering Code Large Language Models with Evol-InstructCode5
StarCoder: may the source be with you!Code5
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-XCode5
Scaling Granite Code Models to 128K ContextCode4
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-stepCode4
CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionCode4
Magicoder: Empowering Code Generation with OSS-InstructCode4
Baichuan 2: Open Large-scale Language ModelsCode4
Reflexion: Language Agents with Verbal Reinforcement LearningCode4
Web-Bench: A LLM Code Benchmark Based on Web Standards and FrameworksCode3
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingCode3
SelfCodeAlign: Self-Alignment for Code GenerationCode3
Automatic Instruction Evolving for Large Language ModelsCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
Show:102550
← PrevPage 1 of 11Next →

No leaderboard results yet.