SOTAVerified

Code Repair

Papers

Showing 110 of 39 papers

TitleStatusHype
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks0
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem SolvingCode3
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents0
CrashFixer: A crash resolution agent for the Linux kernel0
How Accurately Do Large Language Models Understand Code?0
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug ErrorsCode0
RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts0
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair0
AuPair: Golden Example Pairs for Code Repair0
Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent IntegrationCode1
Show:102550
← PrevPage 1 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NSEditAccuracy (medium)13.87Unverified
2CodeBERTAccuracy (medium)5.2Unverified