| CoDEx: A Comprehensive Knowledge Graph Completion Benchmark | Sep 16, 2020 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 |
| A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs | Mar 10, 2020 | BenchmarkingEntity Alignment | CodeCode Available | 1 |
| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 |
| A Dataset for Answering Time-Sensitive Questions | Aug 13, 2021 | Benchmarking | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |
| Attention, Please! Revisiting Attentive Probing for Masked Image Modeling | Jun 11, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 |
| A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis | Aug 12, 2021 | BenchmarkingMedical Image Analysis | CodeCode Available | 1 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| CharacterBench: Benchmarking Character Customization of Large Language Models | Dec 16, 2024 | Benchmarking | CodeCode Available | 1 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |
| Geometric Deep Learning for Structure-Based Drug Design: A Survey | Jun 20, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Chaos as an interpretable benchmark for forecasting and data-driven modelling | Oct 11, 2021 | BenchmarkingSymbolic Regression | CodeCode Available | 1 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |