| SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | May 6, 2024 | Bug fixingLanguage Modeling | CodeCode Available | 11 |
| AutoCodeRover: Autonomous Program Improvement | Apr 8, 2024 | Bug fixingCode Search | CodeCode Available | 7 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| SWE-bench: Can Language Models Resolve Real-World GitHub Issues? | Oct 10, 2023 | Bug fixingCode Generation | CodeCode Available | 4 |
| SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | May 22, 2025 | Bug fixingChatbot | CodeCode Available | 2 |
| CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking | Dec 1, 2024 | Bug fixingCode Generation | CodeCode Available | 2 |
| From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging | Oct 2, 2024 | Auto DebuggingBug fixing | CodeCode Available | 2 |
| CodeR: Issue Resolving with Multi-Agent and Task Graphs | Jun 3, 2024 | Bug fixing | CodeCode Available | 2 |
| CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark | Jul 4, 2025 | Bug fixingCode Generation | CodeCode Available | 1 |
| MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs | Nov 5, 2024 | Bug fixingCode Generation | CodeCode Available | 1 |
| Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests | Aug 21, 2024 | Bug fixingDescriptive | CodeCode Available | 1 |
| CoditT5: Pretraining for Source Code and Natural Language Editing | Aug 10, 2022 | Bug fixingLanguage Modeling | CodeCode Available | 1 |
| FixEval: Execution-based Evaluation of Program Fixes for Programming Problems | Jun 15, 2022 | Bug fixing | CodeCode Available | 1 |
| RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation | Feb 12, 2022 | Authorship AttributionBug fixing | CodeCode Available | 1 |
| Neural Transfer Learning for Repairing Security Vulnerabilities in C Code | Apr 16, 2021 | Bug fixingC++ code | CodeCode Available | 1 |
| D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis | Feb 16, 2021 | Bug fixingVulnerability Detection | CodeCode Available | 1 |
| A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code | Oct 23, 2020 | Bug fixingCode Completion | CodeCode Available | 1 |
| Empirical Study of Transformers for Source Code | Oct 15, 2020 | Bug fixingCode Completion | CodeCode Available | 1 |
| The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries | Jun 14, 2025 | Bug fixingInference Optimization | —Unverified | 0 |
| LongCodeBench: Evaluating Coding LLMs at 1M Context Windows | May 12, 2025 | Bug fixing | —Unverified | 0 |
| APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries | Apr 27, 2025 | Automated Theorem ProvingBug fixing | —Unverified | 0 |
| VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction | Apr 27, 2025 | Bug fixing | —Unverified | 0 |
| On Simulation-Guided LLM-based Code Generation for Safe Autonomous Driving Software | Apr 2, 2025 | Autonomous DrivingBug fixing | —Unverified | 0 |
| Less is More: Adaptive Program Repair with Bug Localization and Preference Learning | Mar 9, 2025 | Bug fixingProgram Repair | CodeCode Available | 0 |
| Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol | Mar 7, 2025 | BenchmarkingBug fixing | —Unverified | 0 |