| CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark | Jul 4, 2025 | Bug fixingCode Generation | CodeCode Available | 1 |
| Use Property-Based Testing to Bridge LLM Code Generation and Validation | Jun 23, 2025 | Code Generationtest driven development | —Unverified | 0 |
| SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner | Jun 10, 2025 | test driven development | CodeCode Available | 1 |
| Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems | Jun 4, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation | May 13, 2025 | Code GenerationIn-Context Learning | —Unverified | 0 |
| Otter: Generating Tests from Issues to Validate SWE Patches | Feb 7, 2025 | test driven development | CodeCode Available | 1 |
| From Defects to Demands: A Unified, Iterative, and Heuristically Guided LLM-Based Framework for Automated Software Repair and Requirement Realization | Dec 6, 2024 | Ingenuitytest driven development | —Unverified | 0 |
| TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved? | Dec 3, 2024 | test driven development | CodeCode Available | 1 |
| Open Source Evolutionary Computation with Chips-n-Salsa | Dec 2, 2024 | Evolutionary Algorithmstest driven development | —Unverified | 0 |
| Evaluation-Driven Development of LLM Agents: A Process Model and Reference Architecture | Nov 21, 2024 | test driven development | —Unverified | 0 |