| ACECODER: Acing Coder RL via Automated Test-Case Synthesis | Feb 3, 2025 | HumanEvalmbpp | —Unverified | 0 |
| Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities | Jan 31, 2025 | Code GenerationHallucination | —Unverified | 0 |
| CoCoNUT: Structural Code Understanding does not fall out of a tree | Jan 27, 2025 | Code GenerationHumanEval | CodeCode Available | 0 |
| QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks | Jan 20, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs | Jan 14, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks | Jan 11, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| Dafny as Verification-Aware Intermediate Language for Code Generation | Jan 10, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion | Jan 6, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Dynamic Scaling of Unit Tests for Code Reward Modeling | Jan 2, 2025 | Code GenerationHumanEval | —Unverified | 0 |
| SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | Dec 30, 2024 | BenchmarkingCode Generation | —Unverified | 0 |