SOTAVerified

HumanEval

Papers

Showing 2650 of 264 papers

TitleStatusHype
Web-Bench: A LLM Code Benchmark Based on Web Standards and FrameworksCode3
CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts0
Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis0
Rewriting Pre-Training Data Boosts LLM Performance in Math and CodeCode1
The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models0
ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement0
DataDecide: How to Predict Best Pretraining Data with Small ExperimentsCode3
Type-Constrained Code Generation with Language Models0
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Can LLMs Enable Verification in Mainstream Programming?0
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models0
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox TestingCode1
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol0
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?0
ThrowBench: Benchmarking LLMs by Predicting Runtime ExceptionsCode0
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingCode3
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language ModelsCode1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance0
MasRouter: Learning to Route LLMs for Multi-Agent SystemsCode2
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising QualityCode0
Show:102550
← PrevPage 2 of 11Next →

No leaderboard results yet.