SOTAVerified

HumanEval

Papers

Showing 201250 of 264 papers

TitleStatusHype
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
Kotlin ML Pack: Technical Report0
Large Language Model Guided Self-Debugging Code Generation0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models0
Learning to Reason via Self-Iterative Process Feedback for Small Language Models0
Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs0
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression0
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation0
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants0
USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding0
Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis0
MojoBench: Language Modeling and Benchmarks for Mojo0
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs0
NExT: Teaching Large Language Models to Reason about Code Execution0
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness0
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs0
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback0
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion0
PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation0
Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases0
Prior Prompt Engineering for Reinforcement Fine-Tuning0
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code0
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation0
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization0
Scattered Forest Search: Smarter Code Space Exploration with LLMs0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity0
SelfEvolve: A Code Evolution Framework via Large Language Models0
Self-Evolving Multi-Agent Collaboration Networks for Software Development0
Self-Explained Keywords Empower Large Language Models for Code Generation0
Semantic-guided Search for Efficient Program Repair with Large Language Models0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Stochastic Code Generation0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models0
Test-Driven Development for Code Generation0
Show:102550
← PrevPage 5 of 6Next →

No leaderboard results yet.