| CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark | Jul 4, 2025 | Bug fixingCode Generation | CodeCode Available | 1 |
| The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries | Jun 14, 2025 | Bug fixingInference Optimization | —Unverified | 0 |
| SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | May 22, 2025 | Bug fixingChatbot | CodeCode Available | 2 |
| LongCodeBench: Evaluating Coding LLMs at 1M Context Windows | May 12, 2025 | Bug fixing | —Unverified | 0 |
| APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries | Apr 27, 2025 | Automated Theorem ProvingBug fixing | —Unverified | 0 |
| VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction | Apr 27, 2025 | Bug fixing | —Unverified | 0 |
| On Simulation-Guided LLM-based Code Generation for Safe Autonomous Driving Software | Apr 2, 2025 | Autonomous DrivingBug fixing | —Unverified | 0 |
| Less is More: Adaptive Program Repair with Bug Localization and Preference Learning | Mar 9, 2025 | Bug fixingProgram Repair | CodeCode Available | 0 |
| Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol | Mar 7, 2025 | BenchmarkingBug fixing | —Unverified | 0 |
| Empirical evaluation of LLMs in predicting fixes of Configuration bugs in Smart Home System | Feb 16, 2025 | Bug fixing | —Unverified | 0 |