| CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks | Jul 3, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| T^3: Multi-level Tree-based Automatic Program Repair with Large Language Models | Jun 26, 2025 | Program Repair | —Unverified | 0 |
| Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories | Jun 23, 2025 | Large Language ModelProgram Repair | —Unverified | 0 |
| Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems | Jun 20, 2025 | Program Repair | —Unverified | 0 |
| SemAgent: A Semantics Aware Program Repair Agent | Jun 19, 2025 | Program Repair | —Unverified | 0 |
| A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair | Jun 5, 2025 | Program RepairVulnerability Detection | —Unverified | 0 |
| An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks | May 27, 2025 | Code GenerationCode Summarization | —Unverified | 0 |
| Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces | May 23, 2025 | Program Repair | —Unverified | 0 |
| Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data | May 12, 2025 | Program RepairSynthetic Data Generation | —Unverified | 0 |
| Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs | May 7, 2025 | Program Repair | —Unverified | 0 |
| The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models | May 5, 2025 | HumanEvalProgram Repair | —Unverified | 0 |
| SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs | Apr 20, 2025 | Program Repair | —Unverified | 0 |
| Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't | Apr 9, 2025 | Program RepairVulnerability Detection | —Unverified | 0 |
| CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching | Mar 28, 2025 | Program Repair | CodeCode Available | 1 |
| Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing | Mar 20, 2025 | FairnessProgram Repair | —Unverified | 0 |
| Evaluating the Generalizability of LLMs in Automated Program Repair | Mar 12, 2025 | Program RepairPrompt Engineering | —Unverified | 0 |
| Less is More: Adaptive Program Repair with Bug Localization and Preference Learning | Mar 9, 2025 | Bug fixingProgram Repair | CodeCode Available | 0 |
| Where's the Bug? Attention Probing for Scalable Fault Localization | Feb 19, 2025 | Fault localizationProgram Repair | —Unverified | 0 |
| LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks | Feb 10, 2025 | Code GenerationProgram Repair | —Unverified | 0 |
| Agentic Bug Reproduction for Effective Automated Program Repair at Google | Feb 3, 2025 | Large Language ModelProgram Repair | —Unverified | 0 |
| o3-mini vs DeepSeek-R1: Which One is Safer? | Jan 30, 2025 | Code GenerationProgram Repair | CodeCode Available | 1 |
| Evaluating Agent-based Program Repair at Google | Jan 13, 2025 | Code GenerationProgram Repair | —Unverified | 0 |
| The Impact of Input Order Bias on Large Language Models for Software Fault Localization | Dec 25, 2024 | Fault localizationMemorization | —Unverified | 0 |
| Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization | Dec 19, 2024 | Fault localizationProgram Repair | —Unverified | 0 |
| Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair | Dec 5, 2024 | Fault localizationProgram Repair | CodeCode Available | 1 |
| Planning-Driven Programming: A Large Language Model Programming Workflow | Nov 21, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation | Nov 12, 2024 | Bug fixingCode Generation | —Unverified | 0 |
| MdEval: Massively Multilingual Code Debugging | Nov 4, 2024 | Program Repair | —Unverified | 0 |
| Semantic-guided Search for Efficient Program Repair with Large Language Models | Oct 22, 2024 | GPUHumanEval | —Unverified | 0 |
| Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code | Oct 13, 2024 | Code GenerationHallucination | —Unverified | 0 |
| In-Context Code-Text Learning for Bimodal Software Engineering | Oct 8, 2024 | Clone DetectionIn-Context Learning | —Unverified | 0 |
| Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench | Oct 6, 2024 | Program Repairvalid | —Unverified | 0 |
| From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging | Oct 2, 2024 | Auto DebuggingBug fixing | CodeCode Available | 2 |
| RepairBench: Leaderboard of Frontier Models for Program Repair | Sep 27, 2024 | Program Repair | CodeCode Available | 1 |
| Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs | Sep 16, 2024 | AllProgram Repair | CodeCode Available | 0 |
| HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale | Sep 9, 2024 | Code GenerationFault localization | CodeCode Available | 3 |
| Enhancing Automated Program Repair with Solution Design | Aug 22, 2024 | Program Repair | —Unverified | 0 |
| RePair: Automated Program Repair with Process-based Feedback | Aug 21, 2024 | Program Repair | CodeCode Available | 0 |
| MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair | Aug 18, 2024 | parameter-efficient fine-tuningProgram Repair | —Unverified | 0 |
| SpecRover: Code Intent Extraction via LLMs | Aug 5, 2024 | Code SearchLarge Language Model | —Unverified | 0 |
| Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models | Jul 4, 2024 | C++ codeCode Generation | —Unverified | 0 |
| Agentless: Demystifying LLM-based Software Engineering Agents | Jul 1, 2024 | Program Repair | CodeCode Available | 7 |
| NARRepair: Non-Autoregressive Code Generation Model for Automatic Program Repair | Jun 24, 2024 | Code GenerationProgram Repair | —Unverified | 0 |
| SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning | Jun 3, 2024 | Code CompletionCode Generation | CodeCode Available | 1 |
| Benchmarking Educational Program Repair | May 8, 2024 | BenchmarkingProgram Repair | CodeCode Available | 0 |
| Automated Program Repair: Emerging trends pose and expose problems for benchmarks | May 8, 2024 | Machine TranslationProgram Repair | —Unverified | 0 |
| Automatic Programming: Large Language Models and Beyond | May 3, 2024 | Program Repair | —Unverified | 0 |
| NExT: Teaching Large Language Models to Reason about Code Execution | Apr 23, 2024 | HumanEvalmbpp | —Unverified | 0 |
| Aligning the Objective of LLM-based Program Repair | Apr 13, 2024 | Fault localizationProgram Repair | CodeCode Available | 1 |
| AutoCodeRover: Autonomous Program Improvement | Apr 8, 2024 | Bug fixingCode Search | CodeCode Available | 7 |