| Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks | Oct 30, 2023 | Benchmarkingobject-detection | CodeCode Available | 2 | 5 |
| Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint) | Jan 14, 2023 | Benchmarking | CodeCode Available | 2 | 5 |
| DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering | Jul 15, 2025 | BenchmarkingInstruction Following | CodeCode Available | 2 | 5 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 | 5 |
| Datasets and Benchmarks for Offline Safe Reinforcement Learning | Jun 15, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 2 | 5 |
| BARS: Towards Open Benchmarking for Recommender Systems | May 19, 2022 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 2 | 5 |
| Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Jun 13, 2024 | BenchmarkingGPU | CodeCode Available | 2 | 5 |
| DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation | Jun 22, 2022 | BenchmarkingRecommendation Systems | CodeCode Available | 2 | 5 |
| DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | Jun 24, 2024 | BenchmarkingImage Generation | CodeCode Available | 2 | 5 |
| Craftium: An Extensible Framework for Creating Reinforcement Learning Environments | Jul 4, 2024 | BenchmarkingMinecraft | CodeCode Available | 2 | 5 |