SOTAVerified

mbpp

Papers

Showing 51100 of 129 papers

TitleStatusHype
Rethinking Repetition Problems of LLMs in Code GenerationCode1
RLTF: Reinforcement Learning from Unit Test FeedbackCode1
EffiLearner: Enhancing Efficiency of Generated Code via Self-OptimizationCode1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
Unsupervised Evaluation of Code LLMs with Round-Trip CorrectnessCode1
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-ExpertsCode1
Discrete Flow Matching0
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs0
DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation0
Structured Chain-of-Thought Prompting for Code Generation0
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Self-Explained Keywords Empower Large Language Models for Code Generation0
What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces0
Interactive Code Generation via Test-Driven User-Intent Formalization0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency0
Interval-censored Hawkes processes0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts0
Large Language Model-Aware In-Context Learning for Code Generation0
CodeMirage: Hallucinations in Code Generated by Large Language Models0
Test-Driven Development for Code Generation0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
Textbooks Are All You Need0
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer0
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation0
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding0
The Program Testing Ability of Large Language Models for Code0
The Stack: 3 TB of permissively licensed source code0
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision0
Brevity is the soul of wit: Pruning long files for code generation0
NExT: Teaching Large Language Models to Reason about Code Execution0
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs0
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs0
AceCoder: Utilizing Existing Code to Enhance Code Generation0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
Type-Constrained Code Generation with Language Models0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases0
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents0
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting0
Prompt Baking0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.