| Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | Oct 6, 2023 | Code GenerationDecision Making | CodeCode Available | 2 |
| Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions | Dec 20, 2022 | Automated Theorem ProvingCode Generation | CodeCode Available | 2 |
| MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation | Aug 17, 2022 | BenchmarkingCode Generation | CodeCode Available | 2 |
| CodeT: Code Generation with Generated Tests | Jul 21, 2022 | Code GenerationHumanEval | CodeCode Available | 2 |
| Rethinking Verification for LLM Code Generation: From Generation to Testing | Jul 9, 2025 | Code GenerationHumanEval | CodeCode Available | 1 |
| Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking | May 20, 2025 | HumanEvalmbpp | CodeCode Available | 1 |
| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Rethinking Repetition Problems of LLMs in Code Generation | May 15, 2025 | Code GenerationHumanEval | CodeCode Available | 1 |
| Rewriting Pre-Training Data Boosts LLM Performance in Math and Code | May 5, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing | Mar 10, 2025 | Code GenerationHumanEval | CodeCode Available | 1 |