| Introducing Milabench: Benchmarking Accelerators for AI | Nov 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Introducing the VoicePrivacy Initiative | May 4, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation | Apr 30, 2025 | 3D Molecule GenerationBenchmarking | CodeCode Available | 1 | 5 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction | Sep 4, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 | 5 |
| Benchmarking Batch Deep Reinforcement Learning Algorithms | Oct 3, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models | May 26, 2025 | BenchmarkingRAG | CodeCode Available | 1 | 5 |
| EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search | Nov 24, 2021 | BenchmarkingNeural Architecture Search | CodeCode Available | 1 | 5 |
| Emoji Prediction: Extensions and Benchmarking | Jul 14, 2020 | BenchmarkingMulti-Label Classification | CodeCode Available | 1 | 5 |
| Benchmarking Low-Shot Robustness to Natural Distribution Shifts | Apr 21, 2023 | Benchmarking | CodeCode Available | 1 | 5 |