SOTAVerified

Benchmarking

Papers

Showing 451460 of 5548 papers

TitleStatusHype
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object DetectionCode1
MatTools: Benchmarking Large Language Models for Materials Science ToolsCode1
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field TestsCode1
Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications GloballyCode1
OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving ConditionsCode1
Towards scalable surrogate models based on Neural Fields for large scale aerodynamic simulationsCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 MinutesCode1
FNBench: Benchmarking Robust Federated Learning against Noisy LabelsCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Show:102550
← PrevPage 46 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified