| GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks | Nov 28, 2024 | BenchmarkingObject Counting | CodeCode Available | 2 | 5 |
| Habitat: A Platform for Embodied AI Research | Apr 2, 2019 | BenchmarkingGPU | CodeCode Available | 2 | 5 |
| AlignBench: Benchmarking Chinese Alignment of Large Language Models | Nov 30, 2023 | Benchmarking | CodeCode Available | 2 | 5 |
| GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning | Jul 4, 2025 | BenchmarkingGraph Generation | CodeCode Available | 2 | 5 |
| A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark | Jan 1, 2024 | Age EstimationBenchmarking | CodeCode Available | 2 | 5 |
| From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking | Jun 24, 2024 | BenchmarkingNeRF | CodeCode Available | 2 | 5 |
| A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning | Sep 26, 2023 | BenchmarkingMulti-Objective Reinforcement Learning | CodeCode Available | 2 | 5 |
| FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | May 5, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 | 5 |
| Fortuna: A Library for Uncertainty Quantification in Deep Learning | Feb 8, 2023 | Bayesian InferenceBenchmarking | CodeCode Available | 2 | 5 |
| AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Dec 19, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 2 | 5 |
| Foundational Models Defining a New Era in Vision: A Survey and Outlook | Jul 25, 2023 | Benchmarking | CodeCode Available | 2 | 5 |
| FedGraph: A Research Library and Benchmark for Federated Graph Learning | Oct 8, 2024 | BenchmarkingFederated Learning | CodeCode Available | 2 | 5 |
| Fast Vision Transformers with HiLo Attention | May 26, 2022 | BenchmarkingEfficient ViTs | CodeCode Available | 2 | 5 |
| FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis | Feb 20, 2025 | Age EstimationBenchmarking | CodeCode Available | 2 | 5 |
| A large annotated medical image dataset for the development and evaluation of segmentation algorithms | Feb 25, 2019 | BenchmarkingSegmentation | CodeCode Available | 2 | 5 |
| FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models | Jul 1, 2024 | BenchmarkingFairness | CodeCode Available | 2 | 5 |
| FaceScore: Benchmarking and Enhancing Face Quality in Human Generation | Jun 24, 2024 | BenchmarkingDenoising | CodeCode Available | 2 | 5 |
| A Survey on Multimodal Benchmarks: In the Era of Large AI Models | Sep 21, 2024 | BenchmarkingSurvey | CodeCode Available | 2 | 5 |
| Exponentially Faster Language Modelling | Nov 15, 2023 | BenchmarkingCPU | CodeCode Available | 2 | 5 |
| AiTLAS: Artificial Intelligence Toolbox for Earth Observation | Jan 21, 2022 | BenchmarkingEarth Observation | CodeCode Available | 2 | 5 |
| Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis | Mar 4, 2023 | BenchmarkingContrastive Learning | CodeCode Available | 2 | 5 |
| Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance | Feb 12, 2025 | BenchmarkingLong-Context Understanding | CodeCode Available | 2 | 5 |
| EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking | Apr 2, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| EvalGIM: A Library for Evaluating Generative Image Models | Dec 13, 2024 | BenchmarkingDiversity | CodeCode Available | 2 | 5 |
| Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing | Apr 3, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 2 | 5 |