mbpp

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 129 papers

Title	Date	Tasks	Status	Hype
any4: Learned 4-bit Numeric Representation for LLMs	Jul 7, 2025	GPUGSM8K	CodeCode Available	2
EvoAgentX: An Automated Framework for Evolving Agentic Workflows	Jul 4, 2025	Code GenerationMath	CodeCode Available	7
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization	Jun 25, 2025	Code GenerationHumanEval	—Unverified	0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models	Jun 23, 2025	Code CompletionGSM8K	—Unverified	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach	May 29, 2025	Code GenerationHumanEval	—Unverified	0
Self-Correcting Code Generation Using Small Language Models	May 29, 2025	Code GenerationHumanEval	CodeCode Available	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking	May 20, 2025	HumanEvalmbpp	CodeCode Available	1
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0
Rethinking Repetition Problems of LLMs in Code Generation	May 15, 2025	Code GenerationHumanEval	CodeCode Available	1
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	May 12, 2025	Code Generation	CodeCode Available	3
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts	May 8, 2025	Code CompletionCode Generation	—Unverified	0
DataDecide: How to Predict Best Pretraining Data with Small Experiments	Apr 15, 2025	ARCHellaSwag	CodeCode Available	3
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning	Apr 14, 2025	Mathematical Reasoningmbpp	CodeCode Available	2
Type-Constrained Code Generation with Language Models	Apr 12, 2025	Code GenerationHumanEval	—Unverified	0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	Apr 5, 2025	Code GenerationHumanEval	—Unverified	0
DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation	Mar 13, 2025	Code Generationmbpp	—Unverified	0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified	0
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding	Mar 4, 2025	HumanEvalmbpp	CodeCode Available	3
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified	0
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models	Feb 23, 2025	Code GenerationHumanEval	CodeCode Available	1
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning	Feb 19, 2025	mbpp	—Unverified	0
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance	Feb 17, 2025	Code GenerationHumanEval	—Unverified	0
MasRouter: Learning to Route LLMs for Multi-Agent Systems	Feb 16, 2025	HumanEvalmbpp	CodeCode Available	2
What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces	Feb 10, 2025	Code Generationmbpp	—Unverified	0
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging	Feb 8, 2025	Code GenerationHumanEval	CodeCode Available	2
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0
Learning to Generate Unit Tests for Automated Debugging	Feb 3, 2025	HumanEvalLarge Language Model	CodeCode Available	1
ACECODER: Acing Coder RL via Automated Test-Case Synthesis	Feb 3, 2025	HumanEvalmbpp	—Unverified	0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks	Jan 20, 2025	Code GenerationHumanEval	—Unverified	0
Control LLM: Controlled Evolution for Intelligence Retention in LLM	Jan 19, 2025	MathMathematical Reasoning	CodeCode Available	1
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement	Dec 30, 2024	Code GenerationHumanEval	—Unverified	0
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation	Dec 30, 2024	Code GenerationHumanEval	CodeCode Available	1
Learning to Reason via Self-Iterative Process Feedback for Small Language Models	Dec 11, 2024	Domain GeneralizationGSM8K	—Unverified	0
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement	Dec 9, 2024	Code GenerationHumanEval	—Unverified	0
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers	Nov 26, 2024	HumanEvalmbpp	CodeCode Available	0
Planning-Driven Programming: A Large Language Model Programming Workflow	Nov 21, 2024	Code GenerationHumanEval	CodeCode Available	1
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs	Nov 20, 2024	Code GenerationHumanEval	—Unverified	0
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback	Nov 18, 2024	HumanEvalmbpp	CodeCode Available	1
VALTEST: Automated Validation of Language Model Generated Test Cases	Nov 13, 2024	HumanEvalLanguage Modeling	—Unverified	0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models	Nov 11, 2024	Code GenerationHumanEval	—Unverified	0
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models	Nov 7, 2024	Code GenerationDecision Making	—Unverified	0
Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models	Oct 30, 2024	Code GenerationHumanEval	—Unverified	0
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'	Oct 29, 2024	Code CompletionCode Generation	CodeCode Available	1
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system	Oct 28, 2024	Code GenerationHumanEval	CodeCode Available	0
Aligning CodeLLMs with Direct Preference Optimization	Oct 24, 2024	Decision MakingHumanEval	—Unverified	0
Scattered Forest Search: Smarter Code Space Exploration with LLMs	Oct 22, 2024	Code GenerationDiversity	—Unverified	0
Self-Explained Keywords Empower Large Language Models for Code Generation	Oct 21, 2024	Code GenerationHumanEval	—Unverified	0

Show:10 25 50

← PrevPage 1 of 3Next →

No leaderboard results yet.