| ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry | Apr 1, 2023 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 1 | 5 |
| Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments | May 8, 2025 | BenchmarkingPrompt Engineering | CodeCode Available | 1 | 5 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 | 5 |
| Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking | Jun 17, 2024 | BenchmarkingDemand Forecasting | CodeCode Available | 1 | 5 |
| SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access Points | Mar 17, 2021 | Activity RecognitionBenchmarking | CodeCode Available | 1 | 5 |
| JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning | Jul 21, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 | 5 |
| A Critical Assessment of State-of-the-Art in Entity Alignment | Oct 30, 2020 | BenchmarkingEntity Alignment | CodeCode Available | 1 | 5 |
| Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset | Nov 5, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems | Mar 19, 2024 | Benchmarkingfeature selection | CodeCode Available | 1 | 5 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes | May 10, 2025 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking | Jun 1, 2022 | BenchmarkingSentence | CodeCode Available | 1 | 5 |
| Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks | May 13, 2021 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| AQuA: A Benchmarking Tool for Label Quality Assessment | Jun 15, 2023 | BenchmarkingLabel Error Detection | CodeCode Available | 1 | 5 |
| APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond | Dec 25, 2023 | Animal Pose EstimationBenchmarking | CodeCode Available | 1 | 5 |
| Evaluating histopathology transfer learning with ChampKit | Jun 14, 2022 | BenchmarkingCell Detection | CodeCode Available | 1 | 5 |
| Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking | Jun 18, 2023 | BenchmarkingLink Prediction | CodeCode Available | 1 | 5 |
| BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models | Jun 2, 2023 | BenchmarkingLanguage Acquisition | CodeCode Available | 1 | 5 |
| Evaluating Multimodal Representations on Visual Semantic Textual Similarity | Apr 4, 2020 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| ISSAFE: Improving Semantic Segmentation in Accidents by Fusing Event-based Data | Aug 20, 2020 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 | 5 |
| Rethinking Machine Unlearning in Image Generation Models | Jun 3, 2025 | BenchmarkingImage Generation | CodeCode Available | 1 | 5 |
| JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds | Nov 5, 2023 | Autonomous NavigationAutonomous Vehicles | CodeCode Available | 1 | 5 |
| Benchmark on Drug Target Interaction Modeling from a Structure Perspective | Jul 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction Tasks | Jul 26, 2024 | BenchmarkingModel Selection | CodeCode Available | 1 | 5 |
| Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms | Jul 8, 2021 | Benchmarking | CodeCode Available | 1 | 5 |