| Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet | Apr 2, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 | 0 |
| A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design | May 19, 2025 | BenchmarkingDrug Discovery | —Unverified | 0 | 0 |
| ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation | Feb 10, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis | Jan 21, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking | May 13, 2022 | Benchmarkingreinforcement-learning | —Unverified | 0 | 0 |
| ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Nov 7, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library | Aug 20, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| Automatic detection of passable roads after floods in remote sensed and social media data | Jan 10, 2019 | BenchmarkingTransfer Learning | —Unverified | 0 | 0 |
| PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice | Feb 28, 2025 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | Jan 3, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms | Oct 11, 2023 | BenchmarkingDenoising | —Unverified | 0 | 0 |
| Automated Structured Radiology Report Generation | May 30, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration | Jun 9, 2023 | BenchmarkingTime Series | —Unverified | 0 | 0 |
| PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation | Sep 4, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Automated Machine Learning on Big Data using Stochastic Algorithm Tuning | Jul 30, 2014 | Bayesian OptimisationBenchmarking | —Unverified | 0 | 0 |
| Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing | Jun 27, 2023 | Benchmarking | —Unverified | 0 | 0 |
| PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension | Dec 16, 2024 | BenchmarkingImage Captioning | —Unverified | 0 | 0 |
| Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models | Dec 30, 2023 | Benchmarkingimage-classification | —Unverified | 0 | 0 |
| Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A | Jun 1, 2015 | BenchmarkingFace Detection | —Unverified | 0 | 0 |
| Automated legal reasoning with discretion to act using s(LAW) | Jan 25, 2024 | BenchmarkingLegal Reasoning | —Unverified | 0 | 0 |
| Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models | Apr 1, 2025 | BenchmarkingConversational Question Answering | —Unverified | 0 | 0 |
| Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem | Jul 13, 2024 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |
| Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN) | Nov 23, 2023 | BenchmarkingBrain Tumor Segmentation | —Unverified | 0 | 0 |
| PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy | Mar 18, 2021 | Art AnalysisBenchmarking | —Unverified | 0 | 0 |
| AutoLay: Benchmarking amodal layout estimation for autonomous driving | Aug 20, 2021 | Amodal Layout EstimationAutonomous Driving | —Unverified | 0 | 0 |
| Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case | Jun 16, 2022 | BenchmarkingDensity Estimation | —Unverified | 0 | 0 |
| Python Random Graph Generator | Sep 20, 2017 | BenchmarkingGraph Generation | —Unverified | 0 | 0 |
| Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery | Jun 17, 2025 | BenchmarkingDrug Discovery | —Unverified | 0 | 0 |
| Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Sep 30, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| AutoAI-TS: AutoAI for Time Series Forecasting | Feb 24, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| QDA^2: A principled approach to automatically annotating charge stability diagrams | Dec 18, 2023 | Benchmarking | —Unverified | 0 | 0 |
| A Universal Protocol to Benchmark Camera Calibration for Sports | Apr 15, 2024 | BenchmarkingCamera Calibration | —Unverified | 0 | 0 |
| A Unified Taylor Framework for Revisiting Attribution Methods | Aug 21, 2020 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| A Complementarity Analysis of the COCO Benchmark Problems and Artificially Generated Problems | Apr 27, 2021 | Benchmarking | —Unverified | 0 | 0 |
| QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges | Jun 24, 2025 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation | Nov 9, 2016 | BenchmarkingTranslation | —Unverified | 0 | 0 |
| QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning | Aug 20, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 | 0 |
| QSAM-Net: Rain streak removal by quaternion neural network with self-attention module | Aug 8, 2022 | Benchmarkingobject-detection | —Unverified | 0 | 0 |
| Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs | Feb 24, 2024 | BenchmarkingKnowledge Graphs | —Unverified | 0 | 0 |
| QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation | May 8, 2025 | BenchmarkingFederated Learning | —Unverified | 0 | 0 |
| Unbounded Bayesian Optimization via Regularization | Aug 14, 2015 | Bayesian OptimizationBenchmarking | —Unverified | 0 | 0 |
| Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling | Sep 24, 2024 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| Quality Assessment of Low Light Restored Images: A Subjective Study and an Unsupervised Model | Feb 4, 2022 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| Quality Assured: Rethinking Annotation Strategies in Imaging AI | Jul 24, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Quality at the Tail of Machine Learning Inference | Dec 25, 2022 | Autonomous DrivingBenchmarking | —Unverified | 0 | 0 |
| Uncertainty estimation for Cross-dataset performance in Trajectory prediction | May 15, 2022 | BenchmarkingPrediction | —Unverified | 0 | 0 |
| A Unified Study of Machine Learning Explanation Evaluation Metrics | Mar 27, 2022 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| QuantBench: Benchmarking AI Methods for Quantitative Investment | Apr 24, 2025 | BenchmarkingContinual Learning | —Unverified | 0 | 0 |
| Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling | Dec 15, 2020 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |