| Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks | Mar 15, 2024 | Adversarial AttackAdversarial Robustness | —Unverified | 0 | 0 |
| OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents | Jun 19, 2025 | Benchmarking | —Unverified | 0 | 0 |
| oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | May 13, 2024 | AttributeAutonomous Driving | —Unverified | 0 | 0 |
| Benchmarking Adversarial Robustness of Compressed Deep Learning Models | Aug 16, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 | 0 |
| Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms | May 22, 2025 | Adversarial AttackBenchmarking | —Unverified | 0 | 0 |
| Out of Distribution Performance of State of Art Vision Model | Jan 25, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Adversarial Robustness | Dec 26, 2019 | Adversarial AttackAdversarial Robustness | —Unverified | 0 | 0 |
| Overconfident Oracles: Limitations of In Silico Sequence Design Benchmarking | Feb 24, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling | May 2, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving | May 1, 2014 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Adversarially Robust Quantum Machine Learning at Scale | Nov 23, 2022 | Adversarial AttackAdversarial Attack Detection | —Unverified | 0 | 0 |
| OVQA: A Clinically Generated Visual Question Answering Dataset | Jul 7, 2022 | BenchmarkingMedical Visual Question Answering | —Unverified | 0 | 0 |
| Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking | May 23, 2022 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Benchmarking adversarial attacks and defenses for time-series data | Aug 30, 2020 | Adversarial DefenseBenchmarking | —Unverified | 0 | 0 |
| PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms | Oct 5, 2024 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches | Apr 22, 2024 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration | Sep 30, 2024 | BenchmarkingIntent Detection | —Unverified | 0 | 0 |
| Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances | Aug 3, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool | Jun 27, 2023 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis | Feb 21, 2025 | 3DGSAutonomous Driving | —Unverified | 0 | 0 |
| Benchmarking Active Learning Strategies for Materials Optimization and Discovery | Apr 12, 2022 | Active LearningBenchmarking | —Unverified | 0 | 0 |
| A critical analysis of metrics used for measuring progress in artificial intelligence | Aug 6, 2020 | Benchmarking | —Unverified | 0 | 0 |
| True Online TD-Replan(lambda) Achieving Planning through Replaying | Jan 31, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Active Learning for NILM | Nov 24, 2024 | Active LearningBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles | Jan 13, 2025 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| Parsing Any Domain English text to CoNLL dependencies | May 1, 2012 | BenchmarkingDependency Parsing | —Unverified | 0 | 0 |
| Trust but Verify: Programmatic VLM Evaluation in the Wild | Oct 17, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 | 0 |
| Participatory Personalization in Classification | Feb 8, 2023 | BenchmarkingClassification | —Unverified | 0 | 0 |
| 'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems | Nov 23, 2016 | BenchmarkingObject | —Unverified | 0 | 0 |
| When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques | May 22, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking a Benchmark: How Reliable is MS-COCO? | Nov 5, 2023 | Benchmarkingimage-classification | —Unverified | 0 | 0 |
| PASTA: A Dataset for Modeling Participant States in Narratives | Jul 31, 2022 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 | 0 |
| Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval | May 28, 2025 | BenchmarkingRecommendation Systems | —Unverified | 0 | 0 |
| PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database | Jun 23, 2021 | BenchmarkingClustering | —Unverified | 0 | 0 |
| PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms | May 4, 2021 | Benchmarking | —Unverified | 0 | 0 |
| PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology | May 26, 2025 | BenchmarkingPrognosis | —Unverified | 0 | 0 |
| Patherea: Cell Detection and Classification for the 2020s | Dec 21, 2024 | BenchmarkingCell Detection | —Unverified | 0 | 0 |
| A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis | May 27, 2024 | Benchmarking | —Unverified | 0 | 0 |
| A Continuously Growing Dataset of Sentential Paraphrases | Aug 1, 2017 | BenchmarkingParaphrase Identification | —Unverified | 0 | 0 |
| Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications | Jul 12, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite | May 20, 2023 | Benchmarking | —Unverified | 0 | 0 |
| PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints | May 23, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Object Pose Estimation in Robotics Revisited | Jun 6, 2019 | 3D Pose Estimation6D Pose Estimation | —Unverified | 0 | 0 |
| Benchmarking 3D multi-coil NC-PDNet MRI reconstruction | Nov 8, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 | 0 |
| Benchmarking 3D Human Pose Estimation Models Under Occlusions | Apr 14, 2025 | 3D Human Pose EstimationBenchmarking | —Unverified | 0 | 0 |
| IN-Sight: Interactive Navigation through Sight | Aug 1, 2024 | BenchmarkingNavigate | —Unverified | 0 | 0 |
| Benchmarking 2D Egocentric Hand Pose Datasets | Sep 11, 2024 | Activity RecognitionBenchmarking | —Unverified | 0 | 0 |
| Benchmark for Antibody Binding Affinity Maturation and Design | May 23, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Perception Test 2023: A Summary of the First Challenge And Outcome | Dec 20, 2023 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 | 0 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 | 0 |