SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 71–80 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Automatic Machine Learning Frameworks	Aug 17, 2018	Automated Feature EngineeringAutoML	CodeCode Available	3	5
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving	May 27, 2024	Autonomous DrivingBenchmarking	CodeCode Available	3	5
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	3	5
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks	Jun 12, 2024	BenchmarkingChatbot	CodeCode Available	3	5
MLVU: Benchmarking Multi-task Long Video Understanding	Jun 6, 2024	BenchmarkingVideo Understanding	CodeCode Available	3	5
BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction	Feb 26, 2025	BenchmarkingTime Series	CodeCode Available	3	5
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis	Oct 9, 2023	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	3	5
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems	Sep 2, 2024	BenchmarkingInstruction Following	CodeCode Available	3	5
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation	Jun 19, 2024	BenchmarkingImage Generation	CodeCode Available	3	5
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks	Jun 13, 2024	Benchmarking	CodeCode Available	3	5

Show:10 25 50

← PrevPage 8 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified