SOTAVerified

Program Repair

Task of teaching ML models to modify an existing program to fix a bug in a given code.

Papers

Showing 150 of 132 papers

TitleStatusHype
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks0
T^3: Multi-level Tree-based Automatic Program Repair with Large Language Models0
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories0
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems0
SemAgent: A Semantics Aware Program Repair Agent0
A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair0
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks0
Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces0
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data0
Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs0
The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models0
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs0
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't0
CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph SearchingCode1
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing0
Evaluating the Generalizability of LLMs in Automated Program Repair0
Less is More: Adaptive Program Repair with Bug Localization and Preference LearningCode0
Where's the Bug? Attention Probing for Scalable Fault Localization0
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks0
Agentic Bug Reproduction for Effective Automated Program Repair at Google0
o3-mini vs DeepSeek-R1: Which One is Safer?Code1
Evaluating Agent-based Program Repair at Google0
The Impact of Input Order Bias on Large Language Models for Software Fault Localization0
Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization0
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program RepairCode1
Planning-Driven Programming: A Large Language Model Programming WorkflowCode1
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation0
MdEval: Massively Multilingual Code Debugging0
Semantic-guided Search for Efficient Program Repair with Large Language Models0
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code0
In-Context Code-Text Learning for Bimodal Software Engineering0
Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench0
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical DebuggingCode2
RepairBench: Leaderboard of Frontier Models for Program RepairCode1
Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugsCode0
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at ScaleCode3
Enhancing Automated Program Repair with Solution Design0
RePair: Automated Program Repair with Process-based FeedbackCode0
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair0
SpecRover: Code Intent Extraction via LLMs0
Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models0
Agentless: Demystifying LLM-based Software Engineering AgentsCode7
NARRepair: Non-Autoregressive Code Generation Model for Automatic Program Repair0
SemCoder: Training Code Language Models with Comprehensive Semantics ReasoningCode1
Benchmarking Educational Program RepairCode0
Automated Program Repair: Emerging trends pose and expose problems for benchmarks0
Automatic Programming: Large Language Models and Beyond0
NExT: Teaching Large Language Models to Reason about Code Execution0
Aligning the Objective of LLM-based Program RepairCode1
AutoCodeRover: Autonomous Program ImprovementCode7
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DrRepair + BIFIAverage Success Rate71.7Unverified
2DrRepairAverage Success Rate68.2Unverified
3SampleFixAverage Success Rate45.3Unverified
4RLAssistAverage Success Rate26.6Unverified
#ModelMetricClaimedVerifiedStatus
1Transformer + BIFIAccuracy (%)90.5Unverified
2TransformerAccuracy (%)62Unverified
#ModelMetricClaimedVerifiedStatus
1MGDebugger (DeepSeek-Coder-V2-Lite)Pass@197.6Unverified
#ModelMetricClaimedVerifiedStatus
1TFixError Removal678Unverified