SOTAVerified

HumanEval

Papers

Showing 101150 of 264 papers

TitleStatusHype
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning TasksCode0
KV Prediction for Improved Time to First TokenCode0
Context-Augmented Code Generation Using Programming Knowledge Graphs0
AIME: AI System Optimization via Multiple LLM Evaluators0
Training Language Models on Synthetic Edit Sequences Improves Code SynthesisCode1
RGD: Multi-LLM Based Agent Debugger via Refinement and Generation GuidanceCode0
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical DebuggingCode2
AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code GenerationCode0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity0
Training Language Models to Self-Correct via Reinforcement LearningCode2
GRIN: GRadient-INformed MoE0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation0
Measuring the Influence of Incorrect Code on Test GenerationCode0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Policy Filtration in RLHF to Fine-Tune LLM for Code GenerationCode1
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding0
Multi-Programming Language Ensemble for Code Generation in Large Language ModelCode0
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality DataCode1
Planning In Natural Language Improves LLM Search For Code GenerationCode1
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining0
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation0
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution0
AutoTest: Evolutionary Code Solution Selection with Test Cases0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs0
Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting0
CodeMirage: Hallucinations in Code Generated by Large Language Models0
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding0
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph DatabasesCode7
ArchCode: Incorporating Software Requirements in Code Generation with Large Language ModelsCode1
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
Discrete Flow Matching0
Scaling Granite Code Models to 128K ContextCode4
Qwen2 Technical ReportCode13
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants0
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-InstructCode1
Brevity is the soul of wit: Pruning long files for code generation0
Towards Large Language Model Aided Program Refinement0
RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository ScaleCode1
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models0
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency0
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All ToolsCode14
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank AdaptationCode0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases0
Validating LLM-Generated Programs with Metamorphic Prompt Testing0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
How Efficient is LLM-Generated Code? A Rigorous & High-Standard BenchmarkCode1
Does your data spark joy? Performance gains from domain upsampling at the end of training0
SemCoder: Training Code Language Models with Comprehensive Semantics ReasoningCode1
Automatic Instruction Evolving for Large Language ModelsCode3
Show:102550
← PrevPage 3 of 6Next →

No leaderboard results yet.