| Benchmarking Scientific Image Forgery Detectors | May 26, 2021 | Benchmarking | —Unverified | 0 |
| Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam | Apr 9, 2021 | BenchmarkingScene Text Recognition | —Unverified | 0 |
| GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra | Jun 9, 2025 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| Benchmarking Sample Selection Strategies for Batch Reinforcement Learning | Sep 29, 2021 | BenchmarkingImitation Learning | —Unverified | 0 |
| A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking | Feb 28, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking | Feb 19, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation | Dec 16, 2021 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Benchmarking Rotary Position Embeddings for Automatic Speech Recognition | Jan 10, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| 7th AI Driving Olympics: 1st Place Report for Panoptic Tracking | Dec 9, 2021 | BenchmarkingPanoptic Segmentation | —Unverified | 0 |
| Geospatial Foundation Models to Enable Progress on Sustainable Development Goals | May 30, 2025 | BenchmarkingEarth Observation | —Unverified | 0 |
| A Theory of Dynamic Benchmarks | Oct 6, 2022 | Benchmarking | —Unverified | 0 |
| GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy | Jul 25, 2024 | Benchmarking | —Unverified | 0 |
| ATG: Benchmarking Automated Theorem Generation for Generative Language Models | May 5, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness | May 5, 2023 | BenchmarkingDataset Distillation | —Unverified | 0 |
| GeoNet: Benchmarking Unsupervised Adaptation across Geographies | Mar 27, 2023 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management | Jun 19, 2023 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation | Mar 2, 2022 | BenchmarkingDeep Learning | —Unverified | 0 |
| A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency | Sep 12, 2019 | BenchmarkingGeneral Classification | —Unverified | 0 |
| Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval | Jan 15, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 |
| Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities | Jun 6, 2023 | BenchmarkingDepth Completion | —Unverified | 0 |
| A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models | Jun 17, 2024 | BenchmarkingSurvey | —Unverified | 0 |
| Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models | Jun 3, 2023 | Benchmarking | —Unverified | 0 |
| AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals | May 21, 2025 | BenchmarkingChatbot | —Unverified | 0 |
| Geometry-Based Next Frame Prediction from Monocular Video | Sep 20, 2016 | Autonomous DrivingBenchmarking | —Unverified | 0 |