| Benchopt: Reproducible, efficient and collaborative optimization benchmarks | Jun 27, 2022 | Benchmarkingimage-classification | CodeCode Available | 4 |
| RecBole 2.0: Towards a More Up-to-Date Recommendation Library | Jun 15, 2022 | BenchmarkingData Augmentation | CodeCode Available | 4 |
| Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets | Mar 9, 2022 | BenchmarkingGraph Regression | CodeCode Available | 4 |
| TabArena: A Living Benchmark for Machine Learning on Tabular Data | Jun 20, 2025 | Benchmarking | CodeCode Available | 3 |
| ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications | Jun 14, 2025 | Benchmarking | CodeCode Available | 3 |
| ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation | May 24, 2025 | BenchmarkingChart Understanding | CodeCode Available | 3 |
| AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models | May 22, 2025 | BenchmarkingFairness | CodeCode Available | 3 |
| IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models | May 22, 2025 | BenchmarkingInstruction Following | CodeCode Available | 3 |
| OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking | May 20, 2025 | Benchmarking | CodeCode Available | 3 |
| Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking | May 16, 2025 | BenchmarkingManagement | CodeCode Available | 3 |