| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | Apr 5, 2025 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | Jul 27, 2023 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Past as a Guide: Leveraging Retrospective Learning for Python Code Completion | Nov 13, 2023 | Code CompletionHumanEval | —Unverified | 0 | 0 |
| PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation | Dec 17, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic? | Oct 26, 2022 | HumanEvalLanguage Modelling | —Unverified | 0 | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 | 0 |
| PLUM: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases | Jun 11, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |
| Prior Prompt Engineering for Reinforcement Fine-Tuning | May 20, 2025 | HumanEvalPrompt Engineering | —Unverified | 0 | 0 |
| Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code | May 29, 2024 | HumanEval | —Unverified | 0 | 0 |
| Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models | Jun 20, 2024 | Code GenerationHumanEval | —Unverified | 0 | 0 |