SOTAVerified

HumanEval

Papers

Showing 201225 of 264 papers

TitleStatusHype
Discrete Flow Matching0
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants0
Brevity is the soul of wit: Pruning long files for code generation0
Towards Large Language Model Aided Program Refinement0
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency0
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank AdaptationCode0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
Validating LLM-Generated Programs with Metamorphic Prompt Testing0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation0
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code0
Kotlin ML Pack: Technical Report0
Can Github issues be solved with Tree Of Thoughts?Code0
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation0
BASS: Batched Attention-optimized Speculative Sampling0
NExT: Teaching Large Language Models to Reason about Code Execution0
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation0
Comments as Natural Logic Pivots: Improve Code Generation via Comment PerspectiveCode0
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation0
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?0
CodeShell Technical Report0
Show:102550
← PrevPage 9 of 11Next →

No leaderboard results yet.