| BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning | Feb 23, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| BLADE: Benchmarking Language Model Agents for Data-Driven Science | Aug 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Ego-Body Pose Estimation via Ego-Head Pose Estimation | Dec 9, 2022 | BenchmarkingDisentanglement | CodeCode Available | 1 | 5 |
| EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography | Oct 31, 2024 | BenchmarkingElectromyography (EMG) | CodeCode Available | 1 | 5 |
| Benchmarking AI scientists in omics data-driven biological research | May 13, 2025 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing | Sep 25, 2024 | BenchmarkingImage Dehazing | CodeCode Available | 1 | 5 |
| ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry | Apr 1, 2023 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 1 | 5 |
| Benchmarking Algorithms for Federated Domain Generalization | Jul 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler | Feb 2, 2023 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 1 | 5 |
| GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning | Feb 3, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 | 5 |
| A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs | Mar 10, 2020 | BenchmarkingEntity Alignment | CodeCode Available | 1 | 5 |
| 4D Panoptic LiDAR Segmentation | Feb 24, 2021 | 4D Panoptic SegmentationBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images | Dec 8, 2023 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase | Jun 21, 2023 | 3D-Aware Image SynthesisBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition | Sep 25, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking LLMs for Political Science: A United Nations Perspective | Feb 19, 2025 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| B-Pref: Benchmarking Preference-Based Reinforcement Learning | Nov 4, 2021 | Benchmarkingreinforcement-learning | CodeCode Available | 1 | 5 |
| Benchmarking and Analyzing Point Cloud Classification under Corruptions | Feb 7, 2022 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| Benchmarking LLMs' Swarm intelligence | May 7, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Low-Shot Robustness to Natural Distribution Shifts | Apr 21, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking of DL Libraries and Models on Mobile Devices | Feb 14, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text | Apr 28, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks | Apr 5, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges | Oct 21, 2022 | BenchmarkingCommunity Detection | CodeCode Available | 1 | 5 |