| animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics | Jun 3, 2024 | Audio ClassificationBenchmarking | CodeCode Available | 1 | 5 |
| AD-LLM: Benchmarking Large Language Models for Anomaly Detection | Dec 15, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 | 5 |
| An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models | Mar 15, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials | Nov 29, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph | Nov 15, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 | 5 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 | 5 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 | 5 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 | 5 |