SOTAVerified|Agents Browse Leaderboard About Blog

mbpp

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 129 papers

Title	Date	Tasks	Status	Hype
Rethinking Repetition Problems of LLMs in Code Generation	May 15, 2025	Code GenerationHumanEval	CodeCode Available	1
RLTF: Reinforcement Learning from Unit Test Feedback	Jul 10, 2023	Code Generationmbpp	CodeCode Available	1
EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization	May 24, 2024	Code GenerationHumanEval	CodeCode Available	1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast	May 23, 2024	Computational EfficiencyGSM8K	CodeCode Available	1
Unsupervised Evaluation of Code LLMs with Round-Trip Correctness	Feb 13, 2024	HumanEvalmbpp	CodeCode Available	1
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	Apr 23, 2024	HumanEvalmbpp	CodeCode Available	1
Discrete Flow Matching	Jul 22, 2024	HumanEvalmbpp	—Unverified	0
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs	Nov 20, 2024	Code GenerationHumanEval	—Unverified	0
DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation	Mar 13, 2025	Code Generationmbpp	—Unverified	0
Structured Chain-of-Thought Prompting for Code Generation	May 11, 2023	Code GenerationHumanEval	—Unverified	0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach	May 29, 2025	Code GenerationHumanEval	—Unverified	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages	Jun 14, 2024	Code Generationmbpp	—Unverified	0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity	Sep 24, 2024	Code GenerationContrastive Learning	—Unverified	0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?	Mar 7, 2025	Code GenerationHumanEval	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
Self-Explained Keywords Empower Large Language Models for Code Generation	Oct 21, 2024	Code GenerationHumanEval	—Unverified	0
What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces	Feb 10, 2025	Code Generationmbpp	—Unverified	0
Interactive Code Generation via Test-Driven User-Intent Formalization	Aug 11, 2022	Code GenerationHumanEval	—Unverified	0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency	Jun 18, 2024	HumanEvalmbpp	—Unverified	0
Interval-censored Hawkes processes	Apr 16, 2021	mbppPoint Processes	—Unverified	0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models	Nov 11, 2024	Code GenerationHumanEval	—Unverified	0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified	0
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts	May 8, 2025	Code CompletionCode Generation	—Unverified	0
Large Language Model-Aware In-Context Learning for Code Generation	Oct 15, 2023	Code GenerationContrastive Learning	—Unverified	0

Show:10 25 50

← PrevPage 3 of 6Next →

No leaderboard results yet.