SOTAVerified

Benchmarking

Papers

Showing 5160 of 5548 papers

TitleStatusHype
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AICode4
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and MetricsCode4
shapiq: Shapley Interactions for Machine LearningCode4
Benchmarking Automatic Machine Learning FrameworksCode3
Advancing LLM Reasoning Generalists with Preference TreesCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning BenchmarksCode3
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous DrivingCode3
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery AgentsCode3
CORL: Research-oriented Deep Offline Reinforcement Learning LibraryCode3
Show:102550
← PrevPage 6 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified