| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 |
| Bag of Tricks for Adversarial Training | Oct 1, 2020 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| DFGC 2021: A DeepFake Game Competition | Jun 2, 2021 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |
| DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models | May 20, 2025 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations | Oct 17, 2023 | BenchmarkingEmotion Recognition | CodeCode Available | 1 |
| A Ladder of Causal Distances | May 5, 2020 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones | Nov 5, 2023 | Benchmarking | CodeCode Available | 1 |
| Disentangled Feature Representation for Few-shot Image Classification | Sep 26, 2021 | BenchmarkingClassification | CodeCode Available | 1 |
| DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Oct 3, 2024 | BenchmarkingImitation Learning | CodeCode Available | 1 |
| ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging | Apr 30, 2024 | BenchmarkingImage Reconstruction | CodeCode Available | 1 |
| Atom-Level Optical Chemical Structure Recognition with Limited Supervision | Apr 2, 2024 | Benchmarking | CodeCode Available | 1 |
| CODEMENV: Benchmarking Large Language Models on Code Migration | Jun 1, 2025 | Benchmarking | CodeCode Available | 1 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 |
| Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods | Sep 17, 2022 | BenchmarkingStereo Matching | CodeCode Available | 1 |
| Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System | Sep 23, 2021 | BenchmarkingResponse Generation | CodeCode Available | 1 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 |
| Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? | Apr 29, 2024 | Answer GenerationBenchmarking | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking AI scientists in omics data-driven biological research | May 13, 2025 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| Beacon, a lightweight deep reinforcement learning benchmark library for flow control | Feb 27, 2024 | BenchmarkingCPU | CodeCode Available | 1 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain | Jun 1, 2022 | BenchmarkingEmotion Recognition | CodeCode Available | 1 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |