| Prompting Scientific Names for Zero-Shot Species Recognition | Oct 15, 2023 | BenchmarkingZero-Shot Learning | —Unverified | 0 |
| Prompt Sketching for Large Language Models | Nov 8, 2023 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet | Apr 2, 2025 | Benchmarking | —Unverified | 0 |
| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 |
| ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation | Feb 10, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis | Jan 21, 2021 | Benchmarking | —Unverified | 0 |
| Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking | May 13, 2022 | Benchmarkingreinforcement-learning | —Unverified | 0 |
| ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Nov 7, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice | Feb 28, 2025 | BenchmarkingDiagnostic | —Unverified | 0 |
| PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | Jan 3, 2025 | Benchmarking | —Unverified | 0 |