SOTAVerified|Agents Browse Leaderboard About Blog

mbpp

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 129 papers

Title	Date	Tasks	Status	Hype
RLTF: Reinforcement Learning from Unit Test Feedback	Jul 10, 2023	Code Generationmbpp	CodeCode Available	1
LeTI: Learning to Generate from Textual Interactions	May 17, 2023	Code GenerationEvent Argument Extraction	CodeCode Available	1
Improving Code Generation by Training with Natural Language Feedback	Mar 28, 2023	Code GenerationImitation Learning	CodeCode Available	1
ReCode: Robustness Evaluation of Code Generation Models	Dec 20, 2022	Code GenerationHumanEval	CodeCode Available	1
Fault-Aware Neural Code Rankers	Jun 4, 2022	Code GenerationHumanEval	CodeCode Available	1
Program Synthesis with Large Language Models	Aug 16, 2021	Few-Shot Learningmbpp	CodeCode Available	1
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	Jun 25, 2025	Code GenerationHumanEval	—Unverified	0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models	Jun 23, 2025	Code CompletionGSM8K	—Unverified	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
Self-Correcting Code Generation Using Small Language Models	May 29, 2025	Code GenerationHumanEval	CodeCode Available	0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach	May 29, 2025	Code GenerationHumanEval	—Unverified	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts	May 8, 2025	Code CompletionCode Generation	—Unverified	0
Type-Constrained Code Generation with Language Models	Apr 12, 2025	Code GenerationHumanEval	—Unverified	0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	Apr 5, 2025	Code GenerationHumanEval	—Unverified	0
DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation	Mar 13, 2025	Code Generationmbpp	—Unverified	0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified	0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified	0
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning	Feb 19, 2025	mbpp	—Unverified	0
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified	0
What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces	Feb 10, 2025	Code Generationmbpp	—Unverified	0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0

Show:10 25 50

← PrevPage 3 of 6Next →

No leaderboard results yet.