| Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms | Jul 8, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning | Dec 11, 2023 | BenchmarkingHuman-Object Interaction Detection | CodeCode Available | 1 | 5 |
| Ego-Body Pose Estimation via Ego-Head Pose Estimation | Dec 9, 2022 | BenchmarkingDisentanglement | CodeCode Available | 1 | 5 |
| Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning | May 30, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Introducing Milabench: Benchmarking Accelerators for AI | Nov 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search | Nov 24, 2021 | BenchmarkingNeural Architecture Search | CodeCode Available | 1 | 5 |
| Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method | May 22, 2023 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| BEND: Benchmarking DNA Language Models on biologically meaningful tasks | Nov 21, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography | Oct 31, 2024 | BenchmarkingElectromyography (EMG) | CodeCode Available | 1 | 5 |
| scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data | Jun 10, 2025 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |