SOTAVerified|Agents Browse Leaderboard About Blog

mbpp

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 129 papers

Title	Date	Tasks	Status	Hype	Score
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision	Feb 5, 2024	GSM8KMath	—Unverified	0	0
Brevity is the soul of wit: Pruning long files for code generation	Jun 29, 2024	Code GenerationHumanEval	—Unverified	0	0
NExT: Teaching Large Language Models to Reason about Code Execution	Apr 23, 2024	HumanEvalmbpp	—Unverified	0	0
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement	Dec 30, 2024	Code GenerationHumanEval	—Unverified	0	0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	Apr 5, 2025	Code GenerationHumanEval	—Unverified	0	0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs	Jan 8, 2024	Code GenerationDiversity	—Unverified	0	0
AceCoder: Utilizing Existing Code to Enhance Code Generation	Mar 31, 2023	Code Generationmbpp	—Unverified	0	0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models	Jun 23, 2025	Code CompletionGSM8K	—Unverified	0	0
Type-Constrained Code Generation with Language Models	Apr 12, 2025	Code GenerationHumanEval	—Unverified	0	0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases	Jun 11, 2024	Code GenerationHumanEval	—Unverified	0	0
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents	Mar 23, 2024	Code GenerationHumanEval	—Unverified	0	0
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting	May 25, 2024	Contrastive Learningmbpp	—Unverified	0	0
Prompt Baking	Sep 4, 2024	ARCGSM8K	—Unverified	0	0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning	Jun 20, 2024	GSM8KHeuristic Search	—Unverified	0	0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks	Jan 20, 2025	Code GenerationHumanEval	—Unverified	0	0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0	0
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified	0	0
Aligning CodeLLMs with Direct Preference Optimization	Oct 24, 2024	Decision MakingHumanEval	—Unverified	0	0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0	0
VALTEST: Automated Validation of Language Model Generated Test Cases	Nov 13, 2024	HumanEvalLanguage Modeling	—Unverified	0	0
ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity	Dec 12, 2023	Code GenerationLanguage Modeling	—Unverified	0	0
Context-Augmented Code Generation Using Programming Knowledge Graphs	Oct 9, 2024	Code GenerationHumanEval	—Unverified	0	0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement	Dec 9, 2024	Code GenerationHumanEval	—Unverified	0	0
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	Jun 25, 2025	Code GenerationHumanEval	—Unverified	0	0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0	0

Show:10 25 50

← PrevPage 4 of 6Next →

No leaderboard results yet.