| DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 | Mar 20, 2023 | BenchmarkingDe-identification | CodeCode Available | 1 | 5 |
| AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery | Oct 31, 2024 | BenchmarkingCloud Removal | CodeCode Available | 1 | 5 |
| Delving into Out-of-Distribution Detection with Medical Vision-Language Models | Mar 2, 2025 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| A Ladder of Causal Distances | May 5, 2020 | BenchmarkingCausal Discovery | CodeCode Available | 1 | 5 |
| ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging | Apr 30, 2024 | BenchmarkingImage Reconstruction | CodeCode Available | 1 | 5 |
| Benchmarking Multidomain English-Indonesian Machine Translation | May 1, 2020 | BenchmarkingMachine Translation | CodeCode Available | 1 | 5 |
| Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks | Apr 5, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Atom-Level Optical Chemical Structure Recognition with Limited Supervision | Apr 2, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for News Summarization | Jan 31, 2023 | BenchmarkingNews Summarization | CodeCode Available | 1 | 5 |
| RobFR: Benchmarking Adversarial Robustness on Face Recognition | Jul 8, 2020 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 | 5 |