| CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis | Oct 6, 2023 | BenchmarkingDomain Generalization | —Unverified | 0 |
| CMOS based image cytometry for detection of phytoplankton in ballast water | Nov 21, 2016 | Benchmarking | —Unverified | 0 |
| Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment | Aug 6, 2019 | Atari GamesBenchmarking | —Unverified | 0 |
| Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos | Jan 1, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios | Apr 16, 2025 | Audio Deepfake DetectionBenchmarking | —Unverified | 0 |
| CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data | Sep 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking Causal Study to Interpret Large Language Models for Source Code | Aug 23, 2023 | BenchmarkingCausal Inference | —Unverified | 0 |
| A new dataset of dog breed images and a benchmark for fine-grained classification | Oct 1, 2020 | BenchmarkingClassification | —Unverified | 0 |
| Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images | Dec 4, 2024 | BenchmarkingBuilding Damage Assessment | —Unverified | 0 |
| An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models | May 23, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| Disambiguation in Conversational Question Answering in the Era of LLM: A Survey | May 18, 2025 | BenchmarkingConversational Question Answering | —Unverified | 0 |
| Discriminative Link Prediction using Local Links, Node Features and Community Structure | Oct 17, 2013 | BenchmarkingClustering | —Unverified | 0 |
| CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis | Mar 29, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 |
| CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | Apr 19, 2025 | Benchmarking | —Unverified | 0 |
| CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings | Jan 2, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis | Jul 1, 2021 | Benchmarking | —Unverified | 0 |
| CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| DiPCo -- Dinner Party Corpus | Sep 30, 2019 | Benchmarking | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors | Dec 15, 2023 | BenchmarkingClassification | —Unverified | 0 |
| CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs | Sep 9, 2024 | Benchmarkingknowledge editing | —Unverified | 0 |
| An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition | Oct 16, 2023 | BenchmarkingMicro Expression Recognition | —Unverified | 0 |