| Benchmarking Automatic Machine Learning Frameworks | Aug 17, 2018 | Automated Feature EngineeringAutoML | CodeCode Available | 3 | 5 |
| Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving | May 27, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 3 | 5 |
| IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models | May 22, 2025 | BenchmarkingInstruction Following | CodeCode Available | 3 | 5 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 | 5 |
| MLVU: Benchmarking Multi-task Long Video Understanding | Jun 6, 2024 | BenchmarkingVideo Understanding | CodeCode Available | 3 | 5 |
| BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction | Feb 26, 2025 | BenchmarkingTime Series | CodeCode Available | 3 | 5 |
| Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis | Oct 9, 2023 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 3 | 5 |
| ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | Sep 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 3 | 5 |
| GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Jun 19, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 | 5 |
| DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks | Jun 13, 2024 | Benchmarking | CodeCode Available | 3 | 5 |