SOTAVerified|Agents Browse Leaderboard About Blog

Code Repair

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 39 papers

Title	Date	Tasks	Status	Hype
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks	Jul 14, 2025	BenchmarkingCode Generation	—Unverified	0
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving	Jul 8, 2025	Code RepairTransfer Learning	CodeCode Available	3
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents	May 30, 2025	BenchmarkingCode Repair	—Unverified	0
CrashFixer: A crash resolution agent for the Linux kernel	Apr 29, 2025	Code Repair	—Unverified	0
How Accurately Do Large Language Models Understand Code?	Apr 6, 2025	Code GenerationCode Repair	—Unverified	0
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors	Mar 28, 2025	BenchmarkingCode Generation	CodeCode Available	0
RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts	Mar 27, 2025	Code RepairFeature Engineering	—Unverified	0
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair	Mar 3, 2025	Code CompletionCode Repair	—Unverified	0
AuPair: Golden Example Pairs for Code Repair	Feb 12, 2025	Code RepairIn-Context Learning	—Unverified	0
Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration	Dec 27, 2024	C++ codeCode Repair	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 4Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NSEdit	Accuracy (medium)	13.87	—	Unverified
2	CodeBERT	Accuracy (medium)	5.2	—	Unverified