| EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios | May 22, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics | Nov 7, 2021 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 | 5 |
| Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks | May 13, 2021 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics | Jul 8, 2020 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 | 5 |
| AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery | Oct 31, 2024 | BenchmarkingCloud Removal | CodeCode Available | 1 | 5 |
| Ego-Body Pose Estimation via Ego-Head Pose Estimation | Dec 9, 2022 | BenchmarkingDisentanglement | CodeCode Available | 1 | 5 |
| Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset | Aug 12, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| PyRelationAL: a python library for active learning research and development | May 23, 2022 | Active LearningBenchmarking | CodeCode Available | 1 | 5 |
| PyRobot: An Open-source Robotics Framework for Research and Benchmarking | Jun 19, 2019 | BenchmarkingRobotic Grasping | CodeCode Available | 1 | 5 |
| Automatic sleep stage classification with deep residual networks in a mixed-cohort setting | Aug 21, 2020 | Automatic Sleep Stage ClassificationBenchmarking | CodeCode Available | 1 | 5 |
| EgoNormia: Benchmarking Physical Social Norm Understanding | Feb 27, 2025 | Answer GenerationBenchmarking | CodeCode Available | 1 | 5 |
| EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning | Dec 11, 2023 | BenchmarkingHuman-Object Interaction Detection | CodeCode Available | 1 | 5 |
| IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics | Oct 11, 2018 | Benchmarking | CodeCode Available | 1 | 5 |
| Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning | May 30, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets | Oct 22, 2020 | ArticlesBenchmarking | CodeCode Available | 1 | 5 |
| EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models | Jun 9, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Recent Advances on Neural Network Pruning at Initialization | Mar 11, 2021 | BenchmarkingNetwork Pruning | CodeCode Available | 1 | 5 |
| Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset | Jul 3, 2024 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence | Nov 1, 2023 | BenchmarkingCryogenic Electron Microscopy (cryo-EM) | CodeCode Available | 1 | 5 |
| Autonomous Microscopy Experiments through Large Language Model Agents | Dec 18, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 | 5 |
| EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner | Jun 30, 2020 | BenchmarkingDepth Estimation | CodeCode Available | 1 | 5 |
| Autonomous Reinforcement Learning: Formalism and Benchmarking | Dec 17, 2021 | Benchmarkingreinforcement-learning | CodeCode Available | 1 | 5 |
| Introducing Milabench: Benchmarking Accelerators for AI | Nov 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data | Jun 10, 2025 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |