| DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs | Mar 20, 2025 | BenchmarkingHallucination | —Unverified | 0 |
| CMOS based image cytometry for detection of phytoplankton in ballast water | Nov 21, 2016 | Benchmarking | —Unverified | 0 |
| Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment | Aug 6, 2019 | Atari GamesBenchmarking | —Unverified | 0 |
| Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities | May 2, 2024 | BenchmarkingManagement | —Unverified | 0 |
| Addressing the Real-world Class Imbalance Problem in Dermatology | Oct 9, 2020 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry | Jan 26, 2025 | BenchmarkingObject Detection | —Unverified | 0 |
| A new dataset of dog breed images and a benchmark for fine-grained classification | Oct 1, 2020 | BenchmarkingClassification | —Unverified | 0 |
| Benchmarking Automated Review Response Generation for the Hospitality Domain | Dec 1, 2020 | BenchmarkingDomain Adaptation | —Unverified | 0 |
| Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields | Aug 11, 2023 | Benchmarking | —Unverified | 0 |
| Benchmarking Automated Machine Learning Methods for Price Forecasting Applications | Apr 28, 2023 | AutoMLBenchmarking | —Unverified | 0 |
| CIMLA: Interpretable AI for inference of differential causal networks | Apr 25, 2023 | Benchmarking | —Unverified | 0 |
| CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis | Mar 29, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 |
| CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis | Oct 6, 2023 | BenchmarkingDomain Generalization | —Unverified | 0 |
| CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | Apr 19, 2025 | Benchmarking | —Unverified | 0 |
| CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings | Jan 2, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos | Jan 1, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios | Apr 16, 2025 | Audio Deepfake DetectionBenchmarking | —Unverified | 0 |
| CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks | Jul 14, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data | Sep 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images | Dec 4, 2024 | BenchmarkingBuilding Damage Assessment | —Unverified | 0 |
| An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models | May 23, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs | Sep 9, 2024 | Benchmarkingknowledge editing | —Unverified | 0 |
| DLUE: Benchmarking Document Language Understanding | May 16, 2023 | BenchmarkingDocument Classification | —Unverified | 0 |
| CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis | Jul 1, 2021 | Benchmarking | —Unverified | 0 |
| CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 |
| Cognitive Model Priors for Predicting Human Decisions | May 22, 2019 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| Coherent Feed Forward Quantum Neural Network | Feb 1, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks | Apr 30, 2020 | BenchmarkingCoherence Evaluation | —Unverified | 0 |
| ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors | Dec 15, 2023 | BenchmarkingClassification | —Unverified | 0 |
| An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition | Oct 16, 2023 | BenchmarkingMicro Expression Recognition | —Unverified | 0 |
| Diverse Community Data for Benchmarking Data Privacy Algorithms | Jun 20, 2023 | Benchmarking | —Unverified | 0 |
| ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models | May 18, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction | Nov 3, 2023 | BenchmarkingSentence | —Unverified | 0 |
| Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration | Jun 17, 2022 | BenchmarkingDepth Estimation | —Unverified | 0 |
| User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance | Aug 4, 2024 | Action AnticipationBenchmarking | —Unverified | 0 |
| ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task | Apr 27, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics | Apr 21, 2022 | AttributeBenchmarking | —Unverified | 0 |
| Distribution-Based Invariant Deep Networks for Learning Meta-Features | Jun 24, 2020 | BenchmarkingGeneral Classification | —Unverified | 0 |
| Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories | Nov 7, 2022 | 3D Reconstruction4D reconstruction | —Unverified | 0 |
| Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics | Sep 17, 2021 | AttributeBenchmarking | —Unverified | 0 |
| ChatGPT Alternative Solutions: Large Language Models Survey | Mar 21, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Commute Graph Neural Networks | Jun 30, 2024 | Benchmarking | —Unverified | 0 |
| An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets | Dec 2, 2023 | Benchmarking | —Unverified | 0 |
| Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts | May 23, 2025 | Benchmarking | —Unverified | 0 |
| Distributed Training Large-Scale Deep Architectures | Aug 10, 2017 | BenchmarkingDeep Learning | —Unverified | 0 |
| Sensitivity analysis and experimental evaluation of PID-like continuous sliding mode control | Aug 13, 2022 | BenchmarkingSensitivity | —Unverified | 0 |