| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments | Oct 18, 2024 | Autonomous NavigationBenchmarking | CodeCode Available | 1 | 5 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 | 5 |
| A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification | Nov 28, 2022 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 | 5 |
| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform | Oct 12, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 | 5 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 | 5 |
| Benchmarking Deep Models for Salient Object Detection | Feb 7, 2022 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph | Nov 15, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials | Nov 29, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| AD-LLM: Benchmarking Large Language Models for Anomaly Detection | Dec 15, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 | 5 |
| Large Scale MRI Collection and Segmentation of Cirrhotic Liver | Oct 6, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 | 5 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 | 5 |
| An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks | Feb 7, 2025 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 | 5 |
| Benchmarking Deep Learning Interpretability in Time Series Predictions | Oct 26, 2020 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 | 5 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks | Jun 14, 2020 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| An Exploration of Embodied Visual Exploration | Jan 7, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 | 5 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 | 5 |