SOTAVerified|Agents Browse Leaderboard About Blog

mbpp

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 129 papers

Title	Date	Tasks	Status	Hype
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation	Sep 11, 2024	Code GenerationHumanEval	CodeCode Available	1
Fault-Aware Neural Code Rankers	Jun 4, 2022	Code GenerationHumanEval	CodeCode Available	1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'	Oct 29, 2024	Code CompletionCode Generation	CodeCode Available	1
Clover: Closed-Loop Verifiable Code Generation	Oct 26, 2023	Code Generationmbpp	CodeCode Available	1
Control LLM: Controlled Evolution for Intelligence Retention in LLM	Jan 19, 2025	MathMathematical Reasoning	CodeCode Available	1
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation	Dec 30, 2024	Code GenerationHumanEval	CodeCode Available	1
Program Synthesis with Large Language Models	Aug 16, 2021	Few-Shot Learningmbpp	CodeCode Available	1
RLTF: Reinforcement Learning from Unit Test Feedback	Jul 10, 2023	Code Generationmbpp	CodeCode Available	1
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models	Jan 12, 2024	Code GenerationHumanEval	CodeCode Available	1
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation	May 19, 2024	Code GenerationHumanEval	CodeCode Available	1
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1
LeTI: Learning to Generate from Textual Interactions	May 17, 2023	Code GenerationEvent Argument Extraction	CodeCode Available	1
Getting the most out of your tokenizer for pre-training and domain adaptation	Feb 1, 2024	Code GenerationDomain Adaptation	CodeCode Available	1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning	Feb 14, 2024	Code GenerationHumanEval	CodeCode Available	1
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback	Nov 18, 2024	HumanEvalmbpp	CodeCode Available	1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models	Mar 11, 2024	Code GenerationHumanEval	CodeCode Available	1
Learning to Generate Unit Tests for Automated Debugging	Feb 3, 2025	HumanEvalLarge Language Model	CodeCode Available	1
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	Jun 17, 2024	GSM8KMath	CodeCode Available	1
Better & Faster Large Language Models via Multi-token Prediction	Apr 30, 2024	HumanEvalmbpp	CodeCode Available	1
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct	Jul 8, 2024	Code GenerationCode Summarization	CodeCode Available	1
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models	Feb 23, 2025	Code GenerationHumanEval	CodeCode Available	1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules	Oct 13, 2023	Code GenerationHumanEval	CodeCode Available	1
CYCLE: Learning to Self-Refine the Code Generation	Mar 27, 2024	Code GenerationHumanEval	CodeCode Available	1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion	Oct 17, 2023	Code CompletionHumanEval	CodeCode Available	1
Improving Code Generation by Training with Natural Language Feedback	Mar 28, 2023	Code GenerationImitation Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 6Next →

No leaderboard results yet.