| MS MARCO: A Human Generated MAchine Reading COmprehension Dataset | Nov 28, 2016 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 |
| CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version) | Nov 19, 2022 | BenchmarkingC++ code | CodeCode Available | 1 |
| Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning | May 30, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| Can 3D Vision-Language Models Truly Understand Natural Language? | Mar 21, 2024 | BenchmarkingDiversity | CodeCode Available | 1 |
| Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all | Oct 17, 2024 | AllBenchmarking | CodeCode Available | 1 |
| AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery | Oct 31, 2024 | BenchmarkingCloud Removal | CodeCode Available | 1 |
| CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer | Dec 2, 2021 | BenchmarkingOrdinal Classification | CodeCode Available | 1 |
| Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation | Oct 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| Multilingual Conceptual Coverage in Text-to-Image Models | Jun 2, 2023 | Benchmarking | CodeCode Available | 1 |
| scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data | Jun 10, 2025 | BenchmarkingData Augmentation | CodeCode Available | 1 |