SOTAVerified

mbpp

Papers

Showing 125 of 129 papers

TitleStatusHype
any4: Learned 4-bit Numeric Representation for LLMsCode2
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach0
Self-Correcting Code Generation Using Small Language ModelsCode0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM WatermarkingCode1
Rethinking Repetition Problems of LLMs in Code GenerationCode1
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
Web-Bench: A LLM Code Benchmark Based on Web Standards and FrameworksCode3
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts0
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative ReasoningCode2
Type-Constrained Code Generation with Language Models0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs0
DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?0
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingCode3
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language ModelsCode1
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning0
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance0
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.