| Parsing Any Domain English text to CoNLL dependencies | May 1, 2012 | BenchmarkingDependency Parsing | —Unverified | 0 | 0 |
| Trust but Verify: Programmatic VLM Evaluation in the Wild | Oct 17, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 | 0 |
| Participatory Personalization in Classification | Feb 8, 2023 | BenchmarkingClassification | —Unverified | 0 | 0 |
| 'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems | Nov 23, 2016 | BenchmarkingObject | —Unverified | 0 | 0 |
| When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques | May 22, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking a Benchmark: How Reliable is MS-COCO? | Nov 5, 2023 | Benchmarkingimage-classification | —Unverified | 0 | 0 |
| PASTA: A Dataset for Modeling Participant States in Narratives | Jul 31, 2022 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 | 0 |
| Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval | May 28, 2025 | BenchmarkingRecommendation Systems | —Unverified | 0 | 0 |
| PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database | Jun 23, 2021 | BenchmarkingClustering | —Unverified | 0 | 0 |
| PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms | May 4, 2021 | Benchmarking | —Unverified | 0 | 0 |
| PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology | May 26, 2025 | BenchmarkingPrognosis | —Unverified | 0 | 0 |
| Patherea: Cell Detection and Classification for the 2020s | Dec 21, 2024 | BenchmarkingCell Detection | —Unverified | 0 | 0 |
| A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis | May 27, 2024 | Benchmarking | —Unverified | 0 | 0 |
| A Continuously Growing Dataset of Sentential Paraphrases | Aug 1, 2017 | BenchmarkingParaphrase Identification | —Unverified | 0 | 0 |
| Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications | Jul 12, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite | May 20, 2023 | Benchmarking | —Unverified | 0 | 0 |
| PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints | May 23, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Object Pose Estimation in Robotics Revisited | Jun 6, 2019 | 3D Pose Estimation6D Pose Estimation | —Unverified | 0 | 0 |
| Benchmarking 3D multi-coil NC-PDNet MRI reconstruction | Nov 8, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 | 0 |
| Benchmarking 3D Human Pose Estimation Models Under Occlusions | Apr 14, 2025 | 3D Human Pose EstimationBenchmarking | —Unverified | 0 | 0 |
| IN-Sight: Interactive Navigation through Sight | Aug 1, 2024 | BenchmarkingNavigate | —Unverified | 0 | 0 |
| Benchmarking 2D Egocentric Hand Pose Datasets | Sep 11, 2024 | Activity RecognitionBenchmarking | —Unverified | 0 | 0 |
| Benchmark for Antibody Binding Affinity Maturation and Design | May 23, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Perception Test 2023: A Summary of the First Challenge And Outcome | Dec 20, 2023 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 | 0 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 | 0 |