| ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Mar 26, 2024 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 | 5 |
| Insights from Benchmarking Frontier Language Models on Web App Code Generation | Sep 8, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection | May 30, 2022 | 3D Object DetectionAutonomous Driving | CodeCode Available | 1 | 5 |
| In Search of Lost Online Test-time Adaptation: A Survey | Oct 31, 2023 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model | Apr 4, 2024 | 3D Part SegmentationBenchmarking | CodeCode Available | 1 | 5 |
| InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models | Jan 19, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 | 5 |
| Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments | May 8, 2025 | BenchmarkingPrompt Engineering | CodeCode Available | 1 | 5 |
| Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning | May 30, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms | Jul 8, 2021 | Benchmarking | CodeCode Available | 1 | 5 |