| FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding | Sep 28, 2023 | BenchmarkingImage Retrieval | CodeCode Available | 1 |
| ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis | Mar 9, 2021 | BenchmarkingClassification | CodeCode Available | 1 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 |
| CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Feb 26, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Benchmarking AI scientists in omics data-driven biological research | May 13, 2025 | BenchmarkingMultiple-choice | CodeCode Available | 1 |
| Foundation Model of Electronic Medical Records for Adaptive Risk Estimation | Feb 10, 2025 | Benchmarking | CodeCode Available | 1 |
| A Dataset for Answering Time-Sensitive Questions | Aug 13, 2021 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Algorithms for Federated Domain Generalization | Jul 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler | Feb 2, 2023 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 1 |
| ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies | Jun 15, 2025 | Benchmarking | CodeCode Available | 1 |