| Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19 | Feb 9, 2021 | BenchmarkingQ-Learning | CodeCode Available | 1 |
| AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets | May 7, 2024 | BenchmarkingCancer Classification | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments | Jul 19, 2022 | BenchmarkingExperimental Design | CodeCode Available | 1 |
| In Search of Lost Online Test-time Adaptation: A Survey | Oct 31, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| Insights from Benchmarking Frontier Language Models on Web App Code Generation | Sep 8, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| A Survey of Pathology Foundation Model: Progress and Future Directions | Apr 5, 2025 | BenchmarkingMultiple Instance Learning | CodeCode Available | 1 |
| A Comprehensive Benchmark for RNA 3D Structure-Function Modeling | Mar 27, 2025 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation | Apr 30, 2025 | 3D Molecule GenerationBenchmarking | CodeCode Available | 1 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |