SOTAVerified

HumanEval

Papers

Showing 226250 of 264 papers

TitleStatusHype
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases0
Prior Prompt Engineering for Reinforcement Fine-Tuning0
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code0
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation0
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization0
Scattered Forest Search: Smarter Code Space Exploration with LLMs0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity0
SelfEvolve: A Code Evolution Framework via Large Language Models0
Self-Evolving Multi-Agent Collaboration Networks for Software Development0
Self-Explained Keywords Empower Large Language Models for Code Generation0
Semantic-guided Search for Efficient Program Repair with Large Language Models0
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Stochastic Code Generation0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation0
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models0
Test-Driven Development for Code Generation0
Show:102550
← PrevPage 10 of 11Next →

No leaderboard results yet.