| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 | 5 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 | 5 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 | 5 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph | Nov 15, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 | 5 |
| AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials | Nov 29, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| AD-LLM: Benchmarking Large Language Models for Anomaly Detection | Dec 15, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 | 5 |