SOTAVerified

mbpp

Papers

Showing 2650 of 129 papers

TitleStatusHype
Control LLM: Controlled Evolution for Intelligence Retention in LLMCode1
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code GenerationCode1
Planning-Driven Programming: A Large Language Model Programming WorkflowCode1
PerfCodeGen: Improving Performance of LLM Generated Code with Execution FeedbackCode1
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'Code1
Policy Filtration in RLHF to Fine-Tune LLM for Code GenerationCode1
Planning In Natural Language Improves LLM Search For Code GenerationCode1
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-InstructCode1
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based SamplingCode1
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code GenerationCode1
EffiLearner: Enhancing Efficiency of Generated Code via Self-OptimizationCode1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code GenerationCode1
Better & Faster Large Language Models via Multi-token PredictionCode1
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-ExpertsCode1
CYCLE: Learning to Self-Refine the Code GenerationCode1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language ModelsCode1
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction TuningCode1
Unsupervised Evaluation of Code LLMs with Round-Trip CorrectnessCode1
Getting the most out of your tokenizer for pre-training and domain adaptationCode1
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language ModelsCode1
Clover: Closed-Loop Verifiable Code GenerationCode1
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code CompletionCode1
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modulesCode1
Show:102550
← PrevPage 2 of 6Next →

No leaderboard results yet.