SOTAVerified

HumanEval

Papers

Showing 126150 of 264 papers

TitleStatusHype
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement0
Type-Constrained Code Generation with Language Models0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Can LLMs Enable Verification in Mainstream Programming?0
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models0
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?0
ThrowBench: Benchmarking LLMs by Predicting Runtime ExceptionsCode0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance0
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising QualityCode0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Large Language Model Guided Self-Debugging Code Generation0
ACECODER: Acing Coder RL via Automated Test-Case Synthesis0
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities0
CoCoNUT: Structural Code Understanding does not fall out of a treeCode0
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks0
Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs0
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks0
Dafny as Verification-Aware Intermediate Language for Code Generation0
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion0
Dynamic Scaling of Unit Tests for Code Reward Modeling0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
Show:102550
← PrevPage 6 of 11Next →

No leaderboard results yet.