| Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models | Apr 1, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Precise Model Benchmarking with Only a Few Observations | Oct 7, 2024 | Benchmarkingmodel | —Unverified | 0 | 0 |
| AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels | Aug 30, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization | May 15, 2025 | BenchmarkingClustering | —Unverified | 0 | 0 |
| Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions | Jul 30, 2019 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index | Nov 28, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling | Jun 6, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Predicting the Performance of a Computing System with Deep Networks | Feb 27, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach | Nov 17, 2023 | BenchmarkingCollision Avoidance | —Unverified | 0 | 0 |
| Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift | Sep 5, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 | 0 |
| Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks | Mar 30, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos | Oct 10, 2018 | BenchmarkingVideo Quality Assessment | —Unverified | 0 | 0 |
| Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds | Aug 10, 2017 | Benchmarking | —Unverified | 0 | 0 |
| Predictive Models from Quantum Computer Benchmarks | May 15, 2023 | Benchmarkingimage-classification | —Unverified | 0 | 0 |
| Auto-tuning TensorFlow Threading Model for CPU Backend | Dec 4, 2018 | BenchmarkingCPU | —Unverified | 0 | 0 |
| Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection | Feb 28, 2022 | BenchmarkingIntrusion Detection | —Unverified | 0 | 0 |
| Benchmarking Machine Reading Comprehension: A Psychological Perspective | Apr 4, 2020 | BenchmarkingMachine Reading Comprehension | —Unverified | 0 | 0 |
| UCCIX: Irish-eXcellence Large Language Model | May 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Pretraining boosts out-of-domain robustness for pose estimation | Sep 24, 2019 | Animal Pose EstimationBenchmarking | —Unverified | 0 | 0 |
| Who Said That? Benchmarking Social Media AI Detection | Oct 12, 2023 | BenchmarkingMisinformation | —Unverified | 0 | 0 |
| Principles and Guidelines for Evaluating Social Robot Navigation Algorithms | Jun 29, 2023 | BenchmarkingRobot Navigation | —Unverified | 0 | 0 |
| PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints | May 12, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search | Apr 7, 2025 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters | May 8, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Privacy-Preserving Language Model Inference with Instance Obfuscation | Feb 13, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery | Mar 27, 2019 | BenchmarkingObject | —Unverified | 0 | 0 |
| Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs | May 10, 2024 | BenchmarkingHyperparameter Optimization | —Unverified | 0 | 0 |
| Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide | Feb 20, 2025 | Adversarial RobustnessBenchmarking | —Unverified | 0 | 0 |
| ProBench: Benchmarking Large Language Models in Competitive Programming | Feb 28, 2025 | AttributeBenchmarking | —Unverified | 0 | 0 |
| UCLID-Net: Single View Reconstruction in Object Space | Jun 6, 2020 | BenchmarkingDecoder | —Unverified | 0 | 0 |
| UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite | Apr 18, 2023 | BenchmarkingInstance Segmentation | —Unverified | 0 | 0 |
| A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms | Dec 1, 2015 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| Automatic vehicle trajectory data reconstruction at scale | Dec 15, 2022 | Benchmarkingvehicle detection | —Unverified | 0 | 0 |
| Problem-solving benefits of down-sampled lexicase selection | Jun 10, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey | Jul 4, 2020 | BenchmarkingSurvey | —Unverified | 0 | 0 |
| Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning | May 31, 2021 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |
| Procedural Generalization by Planning with Self-Supervised World Models | Nov 2, 2021 | BenchmarkingMeta-Learning | —Unverified | 0 | 0 |
| UGSL: A Unified Framework for Benchmarking Graph Structure Learning | Aug 21, 2023 | BenchmarkingGraph structure learning | —Unverified | 0 | 0 |
| ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions | Jul 1, 2024 | BenchmarkingQuestion Generation | —Unverified | 0 | 0 |
| Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning | Oct 6, 2023 | BenchmarkingFederated Learning | —Unverified | 0 | 0 |
| Progressive Class-level Distillation | May 30, 2025 | BenchmarkingKnowledge Distillation | —Unverified | 0 | 0 |
| Progressive Multi-view Human Mesh Recovery with Self-Supervision | Dec 10, 2022 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure | Sep 21, 2022 | BenchmarkingImage Inpainting | —Unverified | 0 | 0 |
| Projective simulation applied to the grid-world and the mountain-car problem | May 21, 2014 | Benchmarkingreinforcement-learning | —Unverified | 0 | 0 |
| Project MPG: towards a generalized performance benchmark for LLM capabilities | Oct 28, 2024 | BenchmarkingChatbot | —Unverified | 0 | 0 |
| Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives | Feb 9, 2018 | BenchmarkingImage Segmentation | —Unverified | 0 | 0 |
| Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study | Jan 25, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Prompting Scientific Names for Zero-Shot Species Recognition | Oct 15, 2023 | BenchmarkingZero-Shot Learning | —Unverified | 0 | 0 |
| Automatic Microprocessor Performance Bug Detection | Nov 17, 2020 | Benchmarking | —Unverified | 0 | 0 |
| Prompt Sketching for Large Language Models | Nov 8, 2023 | Arithmetic ReasoningBenchmarking | —Unverified | 0 | 0 |