| Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents | May 30, 2025 | BenchmarkingCode Repair | —Unverified | 0 |
| CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | May 26, 2024 | Code RepairLanguage Modeling | —Unverified | 0 |
| Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models | Jan 13, 2024 | Code GenerationCode Repair | —Unverified | 0 |
| CrashFixer: A crash resolution agent for the Linux kernel | Apr 29, 2025 | Code Repair | —Unverified | 0 |
| DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models | Feb 19, 2024 | Code RepairFew-Shot Learning | —Unverified | 0 |
| Investigating the Transferability of Code Repair for Low-Resource Programming Languages | Jun 21, 2024 | Code GenerationCode Repair | —Unverified | 0 |
| Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors | Mar 28, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair | Sep 19, 2024 | Code GenerationCode Repair | CodeCode Available | 0 |